Qualitative metrics from the biomedical literature for evaluating large language models in clinical decision-making: a narrative review.
Journal:
BMC medical informatics and decision making
Published Date:
Nov 26, 2024
Abstract
BACKGROUND: The large language models (LLMs), most notably ChatGPT, released since November 30, 2022, have prompted shifting attention to their use in medicine, particularly for supporting clinical decision-making. However, there is little consensus in the medical community on how LLM performance in clinical contexts should be evaluated.