Can Generative LLMs Help Classify Imbalanced Real-World Data? Exploring Rare Diseases on Social Media.

Journal: Studies in health technology and informatics

Published Date: Aug 7, 2025

Abstract

Developmental and Epileptic Encephalopathies (DEEs) are rare, severe conditions often discussed by families on social media, offering valuable insights into their experiences. Identifying these messages amidst unrelated content is crucial but challenging due to data imbalance. This study evaluates different uses of generative large language models (LLMs) for binary classification of DEE-related experiences within social media posts. Using CamemBERT as a baseline, we compared two strategies: zero-shot prompt-based classification and synthetic data generation for minority class augmentation. While zero-shot prompting underperformed, the addition of 2% synthetic data improved all metrics (macro/positive F1, precision and recall). Higher proportions of synthetic data led to decreased precision. These findings underscore the potential of hybrid approaches combining fine-tuning and domain-specific synthetic data for addressing data imbalance in rare disease contexts. Further validation across models and datasets is needed.

Authors

Emma Le Priol

Clinical Bio-Informatics Laboratory, Université Paris Cité, INSERM UMR 1163, Imagine Institute, Paris, France.
Juliette Potier

Clinical Bio-Informatics Laboratory, Université Paris Cité, INSERM UMR 1163, Imagine Institute, Paris, France.
Anita Burgun

Hôpital Necker-Enfants malades, AP-HP, Paris, France.

Keywords

Humans Natural Language Processing Rare Diseases Social Media

External Resources

View on PubMed Access via DOI PubMed (40775945)

Can Generative LLMs Help Classify Imbalanced Real-World Data? Exploring Rare Diseases on Social Media.

Abstract

Authors

Keywords

External Resources

Popular Topics

Recent Journals

Can Generative LLMs Help Classify Imbalanced Real-World Data? Exploring Rare Diseases on Social Media.

Abstract

Authors

Keywords

External Resources

Stay Ahead of Medical AI

Popular Topics

Recent Journals