Application of unified health large language model evaluation framework to In-Basket message replies: bridging qualitative and quantitative assessments.
Journal:
Journal of the American Medical Informatics Association : JAMIA
Published Date:
Apr 1, 2025
Abstract
OBJECTIVES: Large language models (LLMs) are increasingly utilized in healthcare, transforming medical practice through advanced language processing capabilities. However, the evaluation of LLMs predominantly relies on human qualitative assessment, which is time-consuming, resource-intensive, and may be subject to variability and bias. There is a pressing need for quantitative metrics to enable scalable, objective, and efficient evaluation.