Comparison of local large language models for extraction of signs and symptoms data from electronic health records

Journal: medRxiv
Published Date:

Abstract

Electronic health records (EHRs) provide a large source of data that can be used for research purposes. Extraction of information from unstructured clinical notes in EHRs can be automated by large language models (LLMs). Although LLMs are promising for this task, challenges remain in reliable application of LLMs to EHR, including the lack of development and validation for languages other than English. Here, we identified Dutch LLMs and compared their performance in a case study. We selected the MedRoBERTa.nl and RobBERT models based on local applicability, Dutch language compatibility, and model architecture. We evaluated their performance in a case study on the extraction of signs and symptoms from comprehensive Dutch primary care EHRs of patients with a lower respiratory tract infection. Using manually annotated clinical notes, models were trained as direct and prompt-based classifiers with varying amounts of training samples. Performance was expressed by precision, recall, and F1-score. The MedROBERTa.nl and RobBERT models showed good performance as direct classifiers, with a macro-averaged F1-score of 0.74 (range 0.56-0.87) and 0.69 (range 0.46-0.86) using 1600 training samples, respectively. The prompt-based classifiers performed worse with F1-scores of 0.08 (range 0.02-0.30) and 0.08 (range 0.02-0.22), respectively. In general, performance of the models was negatively affected by class imbalance and missingness of signs and symptoms. A minimum of 800 annotated training samples were required to obtain sufficient performance. The selected LLMs showed good performance as direct classifiers in extracting signs and symptoms from Dutch primary care EHRs. However, prompt-based models require performance improvement by further prompt engineering, and caution is warranted with imbalanced or partially missing EHR data.MedROBERTa.nl and RobBERT models, used as direct classifiers, can be considered for clinical research to extract information from clinical notes from Dutch primary care EHRs, potentially reducing manual annotation time and accelerating real-world research and evidence generation.

Authors

  • Isa Spiero; Merijn H. Rijk; Matthew A. Scheeres; Frans H. Rutten; Geert-Jan Geersing; Tamara N. Platteel; Karel G.M. Moons; Lotty Hooft; Johanna A.A. Damen; Roderick P. Venekamp; Artuur M. Leeuwenberg