Performance of Open-Source Large Language Models to Extract Symptoms from Clinical Notes.

Journal: Studies in health technology and informatics

Published Date: Aug 7, 2025

Abstract

In this study, we examined how well the open-source foundational large language models (LLMs) can extract symptoms and signs (S&S), along with their corresponding ICD-10 codes, from clinical notes found in the public MTSamples dataset. The dataset comprising notes of patients with genitourinary conditions was manually annotated to compare the S&S extraction results with outputs generated by LLMs. We assessed three versions of the Llama model-Llama 3.1-13B, Llama 3.3-70B, and Me-Llama-13B-focusing on their consistency, runtime, and performance. Each model was tested on two tasks: (1) S&S extraction and (2) ICD-10 code generation. Our findings indicate that Llama 3.3-70B performed the best overall. With fast runtime and high consistency, it achieved an average recall of 0.87 and an average precision of 0.71 for S&S extraction, as well as an average recall of 0.71 and an average precision of 0.54 for ICD-10 code generation.

Authors

Yunbing Bai

Department of Biomedical Informatics, School of Medicine, University of Utah, Salt Lake City, Utah.
Wanting Cui

Icahn School of Medicine at Mount Sinai, New York, NY, USA.
Joseph Finkelstein

Department of Biomedical Informatics, School of Medicine, University of Utah, USA.

Keywords

Data Mining Electronic Health Records Humans International Classification of Diseases Large Language Models Natural Language Processing Symptom Assessment

External Resources

View on PubMed Access via DOI PubMed (40775941)

Performance of Open-Source Large Language Models to Extract Symptoms from Clinical Notes.

Abstract

Authors

Keywords

External Resources

Popular Topics

Recent Journals

Performance of Open-Source Large Language Models to Extract Symptoms from Clinical Notes.

Abstract

Authors

Keywords

External Resources

Don't Miss the Future of Medicine

Popular Topics

Recent Journals