Application of unified health large language model evaluation framework to In-Basket message replies: bridging qualitative and quantitative assessments.

Journal: Journal of the American Medical Informatics Association : JAMIA
Published Date:

Abstract

OBJECTIVES: Large language models (LLMs) are increasingly utilized in healthcare, transforming medical practice through advanced language processing capabilities. However, the evaluation of LLMs predominantly relies on human qualitative assessment, which is time-consuming, resource-intensive, and may be subject to variability and bias. There is a pressing need for quantitative metrics to enable scalable, objective, and efficient evaluation.

Authors

  • Chuan Hong
    Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA.
  • Anand Chowdhury
    Department of Medicine, Duke University School of Medicine, Durham, North Carolina.
  • Anthony D Sorrentino
    Department of Medicine, Duke University School of Medicine, Durham, NC 27710, United States.
  • Haoyuan Wang
    Department of Orthopedics, Shanghai Sixth People's Hospital Affiliated to Shanghai Jiao Tong University School of Medicine, Shanghai, China.
  • Monica Agrawal
    Department of Computer Science, Stanford University, Stanford, CA, USA.
  • Armando Bedoya
    Algorithm-Based Clinical Decision Support (ABCDS) Oversight, Office of Vice Dean of Data Science, School of Medicine, Duke University, Durham, NC, 27705, USA.
  • Sophia Bessias
    Duke AI Health, Duke University School of Medicine, Durham, NC.
  • Nicoleta J Economou-Zavlanos
    Duke University School of Medicine, Durham, North Carolina, USA.
  • Ian Wong
    Department of Biostatistics and Bioinformatics, Duke University School of Medicine, Durham, NC 27710, United States.
  • Christian Pean
    Department of Medicine, Duke University School of Medicine, Durham, NC 27710, United States.
  • Fan Li
    Department of Instrument Science and Engineering, School of SEIEE, Shanghai Jiao Tong University, Shanghai 200240, China.
  • Kathryn I Pollak
    Cancer Prevention and Control Research Program, Duke Cancer Institute, Durham, NC 27710, United States.
  • Eric G Poon
    Duke University Health System, Durham, NC, United States.
  • Michael J Pencina
    Duke Clinical Research Institute, Department of Biostatistics and Bioinformatics, Duke University, Durham, North Carolina.