Application of unified health large language model evaluation framework to In-Basket message replies: bridging qualitative and quantitative assessments.

Journal: Journal of the American Medical Informatics Association : JAMIA

Published Date: Apr 1, 2025

Abstract

OBJECTIVES: Large language models (LLMs) are increasingly utilized in healthcare, transforming medical practice through advanced language processing capabilities. However, the evaluation of LLMs predominantly relies on human qualitative assessment, which is time-consuming, resource-intensive, and may be subject to variability and bias. There is a pressing need for quantitative metrics to enable scalable, objective, and efficient evaluation.

Authors

Chuan Hong

Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA.
Anand Chowdhury

Department of Medicine, Duke University School of Medicine, Durham, North Carolina.
Anthony D Sorrentino

Department of Medicine, Duke University School of Medicine, Durham, NC 27710, United States.
Haoyuan Wang

Department of Orthopedics, Shanghai Sixth People's Hospital Affiliated to Shanghai Jiao Tong University School of Medicine, Shanghai, China.
Monica Agrawal

Department of Computer Science, Stanford University, Stanford, CA, USA.
Armando Bedoya

Algorithm-Based Clinical Decision Support (ABCDS) Oversight, Office of Vice Dean of Data Science, School of Medicine, Duke University, Durham, NC, 27705, USA.
Sophia Bessias

Duke AI Health, Duke University School of Medicine, Durham, NC.
Nicoleta J Economou-Zavlanos

Duke University School of Medicine, Durham, North Carolina, USA.
Ian Wong

Department of Biostatistics and Bioinformatics, Duke University School of Medicine, Durham, NC 27710, United States.
Christian Pean

Department of Medicine, Duke University School of Medicine, Durham, NC 27710, United States.
Fan Li

Department of Instrument Science and Engineering, School of SEIEE, Shanghai Jiao Tong University, Shanghai 200240, China.
Kathryn I Pollak

Cancer Prevention and Control Research Program, Duke Cancer Institute, Durham, NC 27710, United States.
Eric G Poon

Duke University Health System, Durham, NC, United States.
Michael J Pencina

Duke Clinical Research Institute, Department of Biostatistics and Bioinformatics, Duke University, Durham, North Carolina.

Keywords

Artificial Intelligence Humans Large Language Models Natural Language Processing Unified Medical Language System

External Resources

View on PubMed Access via DOI PubMed (40063081)

Application of unified health large language model evaluation framework to In-Basket message replies: bridging qualitative and quantitative assessments.

Abstract

Authors

Keywords

External Resources

Popular Topics

Recent Journals