Large language models are less effective at clinical prediction tasks than locally trained machine learning models.

Journal: Journal of the American Medical Informatics Association : JAMIA

Published Date: May 1, 2025

Abstract

OBJECTIVES: To determine the extent to which current large language models (LLMs) can serve as substitutes for traditional machine learning (ML) as clinical predictors using data from electronic health records (EHRs), we investigated various factors that can impact their adoption, including overall performance, calibration, fairness, and resilience to privacy protections that reduce data fidelity.

Authors

Katherine E Brown

Department of Biomedical Informatics, Vanderbilt University Medical Center (VUMC), Nashville, TN 37203, United States.
Chao Yan

School of Pharmacy, Shanghai Jiao Tong University, Shanghai 200240, China.
Zhuohang Li

Department of Computer Science, Vanderbilt University, Nashville, TN 37212, United States.
Xinmeng Zhang

Department of Computer Science, Vanderbilt University, Nashville, TN 37212, United States.
Benjamin X Collins

Department of Biomedical Informatics, Vanderbilt University Medical Center (VUMC), Nashville, TN 37203, United States.
You Chen

Dept. of Biomedical Informatics, School of Medicine, Vanderbilt University, Nashville, TN, USA.
Ellen Wright Clayton

Law School, Vanderbilt University, Nashville, TN 37203, United States.
Murat Kantarcioglu

Department of Computer Science, University of Texas at Dallas, Richardson, Texas 75080, United States.
Yevgeniy Vorobeychik

Vanderbilt University.
Bradley A Malin

Vanderbilt University, Nashville, TN.

Keywords

Algorithms Electronic Health Records Female Humans Language Large Language Models Machine Learning Male Middle Aged ROC Curve

External Resources

View on PubMed Access via DOI PubMed (40056436)

Large language models are less effective at clinical prediction tasks than locally trained machine learning models.

Abstract

Authors

Keywords

External Resources

Popular Topics

Recent Journals