Large language models are less effective at clinical prediction tasks than locally trained machine learning models.

Journal: Journal of the American Medical Informatics Association : JAMIA
Published Date:

Abstract

OBJECTIVES: To determine the extent to which current large language models (LLMs) can serve as substitutes for traditional machine learning (ML) as clinical predictors using data from electronic health records (EHRs), we investigated various factors that can impact their adoption, including overall performance, calibration, fairness, and resilience to privacy protections that reduce data fidelity.

Authors

  • Katherine E Brown
    Department of Biomedical Informatics, Vanderbilt University Medical Center (VUMC), Nashville, TN 37203, United States.
  • Chao Yan
    School of Pharmacy, Shanghai Jiao Tong University, Shanghai 200240, China.
  • Zhuohang Li
    Department of Computer Science, Vanderbilt University, Nashville, TN 37212, United States.
  • Xinmeng Zhang
    Department of Computer Science, Vanderbilt University, Nashville, TN 37212, United States.
  • Benjamin X Collins
    Department of Biomedical Informatics, Vanderbilt University Medical Center (VUMC), Nashville, TN 37203, United States.
  • You Chen
    Dept. of Biomedical Informatics, School of Medicine, Vanderbilt University, Nashville, TN, USA.
  • Ellen Wright Clayton
    Law School, Vanderbilt University, Nashville, TN 37203, United States.
  • Murat Kantarcioglu
    Department of Computer Science, University of Texas at Dallas, Richardson, Texas 75080, United States.
  • Yevgeniy Vorobeychik
    Vanderbilt University.
  • Bradley A Malin
    Vanderbilt University, Nashville, TN.