Qualitative metrics from the biomedical literature for evaluating large language models in clinical decision-making: a narrative review.

Journal: BMC medical informatics and decision making

Published Date: Nov 26, 2024

Abstract

BACKGROUND: The large language models (LLMs), most notably ChatGPT, released since November 30, 2022, have prompted shifting attention to their use in medicine, particularly for supporting clinical decision-making. However, there is little consensus in the medical community on how LLM performance in clinical contexts should be evaluated.

Authors

Cindy N Ho

Diabetes Technology Society, Burlingame, CA, USA.
Tiffany Tian

Diabetes Technology Society, Burlingame, CA, USA.
Alessandra T Ayers

Diabetes Technology Society, Burlingame, CA, USA.
Rachel E Aaron

Diabetes Technology Society, Burlingame, CA, USA.
Vidith Phillips

School of Medicine, Johns Hopkins University, Baltimore, MD, USA.
Risa M Wolf

Department of Pediatric Endocrinology and Diabetes, Johns Hopkins University School of Medicine, Baltimore, MD.
Nestoras Mathioudakis

School of Medicine, Johns Hopkins University, Baltimore, MD, USA.
Tinglong Dai

Hopkins Business of Health Initiative, Johns Hopkins University, Washington, DC, USA.
David C Klonoff

2 Mills-Peninsula Medical Center, San Mateo, CA, USA.

Keywords

Artificial Intelligence Clinical Decision-Making Humans

External Resources

View on PubMed Access via DOI PubMed (39593074)

Qualitative metrics from the biomedical literature for evaluating large language models in clinical decision-making: a narrative review.

Abstract

Authors

Keywords

External Resources

Popular Topics

Recent Journals