Assessing large language models for acute heart failure classification and information extraction from French clinical notes.

Journal: Computers in biology and medicine
Published Date:

Abstract

Understanding acute heart failure (AHF) remains a significant challenge, as many clinical details are recorded in unstructured text rather than structured data in electronic health records (EHRs). In this study, we explored the use of large language models (LLMs) to automatically identify AHF hospitalizations and extract accurate AHF-related clinical information from clinical notes. Based on clinical notes from the Nantes University Hospital in France, we used a general-purpose LLM, Qwen2-7B, and evaluated its performance against a French biomedical pretrained model, DrLongformer. We explored supervised fine-tuning and in-context learning techniques, such as few-shot and chain-of-thought prompting, and performed an ablation study to analyze the impact of data volume and annotation characteristics on model performance. Our results demonstrated that DrLongformer achieved superior performance in classifying AHF hospitalizations, with an F1 score of 0.878 compared to 0.80 for Qwen2-7B, and similarly outperformed in extracting most of the clinical information. However, Qwen2-7B showed better performance in extracting quantitative outcomes when fine-tuned on the training set (typically weight and body mass index, for example). Our ablation study revealed that the number of clinical notes used in training is a significant factor influencing model performance, but improvements plateaued after 250 documents. Additionally, we observed that longer annotations negatively impact model training and downstream performance. The findings highlight the potential of small language models-which can be hosted on-premise in hospitals and integrated with EHRs-to improve real-world data collection and identify complex medical symptoms such as acute heart failure.

Authors

  • Adrien Bazoge
    Nantes Université, CHU Nantes, Pôle Hospitalo-Universitaire 11: Santé Publique, Clinique des données, INSERM, CIC 1413, F-44000, Nantes, France; Nantes Université, École Centrale Nantes, CNRS, LS2N, UMR 6004, F-44000, Nantes, France. Electronic address: adrien.bazoge@univ-nantes.fr.
  • Matthieu Wargny
    Nantes Université, CHU Nantes, Pôle Hospitalo-Universitaire 11: Santé Publique, Clinique des données, INSERM, CIC 1413, F-44000, Nantes, France; Nantes Université, CHU Nantes, Département d'Endocrinologie, Diabétologie et Nutrition, l'institut du thorax, Inserm, CNRS, Hôpital Guillaume et René Laennec, F-44000, Nantes, France. Electronic address: matthieu.wargny@chu-nantes.fr.
  • Pacôme Constant Dit Beaufils
    Nantes Université, CHU Nantes, Service de neurologie, l'institut du thorax, Inserm, CNRS, Hôpital Guillaume et René Laennec, F-44000, Nantes, France. Electronic address: pacome.constantditbeaufils@chu-nantes.fr.
  • Emmanuel Morin
    Nantes Université, École Centrale Nantes, CNRS, LS2N, UMR 6004, F-44000, Nantes, France. Electronic address: emmanuel.morin@univ-nantes.fr.
  • Béatrice Daille
    Nantes Université, École Centrale Nantes, CNRS, LS2N, UMR 6004, F-44000, Nantes, France. Electronic address: beatrice.daille@univ-nantes.fr.
  • Pierre-Antoine Gourraud
    MS Genetics, Department of Neurology, School of Medicine, University of California San Francisco (UCSF), San Francisco, CA, USA/Université de Nantes, INSERM, UMR 1064, ATIP-Avenir, Equipe 5 Centre de Recherche en Transplantation et Immunologie, Nantes, France.
  • Samy Hadjadj
    Institut du thorax, INSERM, CNRS, Université Nantes, CHU Nantes, Nantes, France.