Leveraging Large Language Models for Synthetic Data Generation to Enhance Adverse Drug Event Detection in Tweets.

Journal: Studies in health technology and informatics

Published Date: May 15, 2025

Abstract

Adverse drug event (ADE) detection in social media texts poses significant challenges due to the informal nature of the text and the limited availability of annotations. The scarcity of ADE named entity recognition (NER) datasets for social media hinders the development of robust ADE detection models for this type of corpus. In this paper, we leveraged the generative capabilities of large language models (LLMs) to create synthetic data, addressing this dataset gap. Specifically, we generated 17,000 tweets with ADE annotations and pre-trained NER models on this synthetic data. Our evaluations on an out-of-sample collection of 915 manually annotated tweets revealed that these models outperform state-of-the-art lexico-based and massively pre-trained open NER models. We also show that fine-tuning our synthetically pre-trained models on human-annotated data surpasses the current state-of-the-art in ADE detection on tweets. These findings suggest that synthetic data generated by LLMs can enhance ADE detection performance, offering a promising avenue to explore in response to the scarcity of annotated ADE datasets. The synthetic dataset is available at https://huggingface.co/datasets/anthonyyazdaniml/synthetic-ner-ade-tweets-v1.

Authors

Anthony Yazdani

Department of Radiology and Medical Informatics, Faculty of Medicine, University of Geneva, Geneva, Switzerland.
Hossein Rouhizadeh

Department of Radiology and Medical Informatics, Faculty of Medicine, University of Geneva, Geneva, Switzerland. hossein.rouhizadeh@unige.ch.
Alban Bornet

Department of Radiology and Medical Informatics, University of Geneva, Geneva, Switzerland.
Douglas Teodoro

Department of Radiology and Medical Informatics, University of Geneva, Geneva, Switzerland.

Keywords

Adverse Drug Reaction Reporting Systems Data Mining Drug-Related Side Effects and Adverse Reactions Humans Large Language Models Natural Language Processing Social Media

External Resources

View on PubMed Access via DOI PubMed (40380573)

Leveraging Large Language Models for Synthetic Data Generation to Enhance Adverse Drug Event Detection in Tweets.

Abstract

Authors

Keywords

External Resources

Popular Topics

Recent Journals