Detecting the clinical features of difficult-to-treat depression using synthetic data from large language models.

Journal: Computers in biology and medicine

Published Date: Jun 10, 2025

Abstract

Difficult-to-treat depression (DTD) has been proposed as a broader and more clinically comprehensive perspective on a person's depressive disorder where, despite treatment, they continue to experience significant burden. We sought to develop a tool capable of interrogating routinely-collected narrative (free-text) electronic health record (EHR) data to locate known prognostic factors identified from the scientific literature that capture the clinical syndrome of DTD. Thus, we aim to address the upstream aspect of DTD detection, that is the identification of relevant factors. In this work, we use Large Language Model (LLM)-generated synthetic data (GPT3.5) and a Non-Maximum Suppression (NMS) algorithm to train a BERT-based span extraction model. The model is trained to extract and label spans related to a variety of relevant positive and negative factors (i.e. spans of text that increase or decrease the likelihood of a patient matching the DTD syndrome). We test the model on both a synthetic and a clinical test set. We obtain good overall performance (0.70 F1 across polarity) on clinical data from EHRs for extracting as many as 20 different factors considered as predictors of DTD and high performance (0.85 F1 with 0.95 precision) on a subset of important DTD factors such as history of abuse, family history of affective disorder, illness severity and suicidality. We show it is possible to train a model exclusively on synthetic data to extract prognostic factors in clinical data. Our results show promise for future healthcare applications especially in applications where traditionally, highly confidential medical data and costly human-expert annotations would normally be required.

Authors

Isabelle Lorge

Department of Psychiatry, University of Oxford, UK.
Dan W Joyce

Department of Primary Care and Mental Health, University of Liverpool, Liverpool, United Kingdom.
Niall Taylor

Department of Psychiatry, University of Oxford, Oxford, United Kingdom. Electronic address: niall.taylor@st-hughs.ox.ac.uk.
Alejo Nevado-Holgado

University of Oxford, United Kingdom of Great Britain and Northern Ireland. Electronic address: alejo.nevado-holgado@psych.ox.ac.uk.
Andrea Cipriani

Department of Psychiatry, University of Oxford, Oxford, Oxfordshire, UK.
Andrey Kormilitzin

University of Oxford, United Kingdom of Great Britain and Northern Ireland. Electronic address: andrey.kormilitzin@psych.ox.ac.uk.

Keywords

Algorithms Data Mining Depressive Disorder, Treatment-Resistant Electronic Health Records Humans Large Language Models Natural Language Processing

External Resources

View on PubMed Access via DOI PubMed (40499374)

Detecting the clinical features of difficult-to-treat depression using synthetic data from large language models.

Abstract

Authors

Keywords

External Resources

Popular Topics

Recent Journals

Detecting the clinical features of difficult-to-treat depression using synthetic data from large language models.

Abstract

Authors

Keywords

External Resources

Stay Ahead of Medical AI

Popular Topics

Recent Journals