LLM-Based Medical Document Evaluation: Integrating Human Expert Insights.

Journal: Studies in health technology and informatics
Published Date:

Abstract

Large Language Models (LLMs) show potential in medical document generation, but ensuring reliability requires extensive expert involvement, limiting clinical applications. To address this challenge, we developed an LLM-based evaluation framework with three progressive Chain of Thought (CoT) strategies: Qualitative (expert persona), Quantitative-qualitative (error analysis), and Insight-integrated (expert reasoning). This framework captures nuanced evaluation patterns while maintaining efficiency. When tested on 33 LLM-generated Emergency Department records across five criteria, our Insight-integrated approach demonstrated strong correlation with expert evaluations (r = 0.680, p < .001), outperforming both Qualitative (r = 0.524) and Quantitative-qualitative (r = 0.630) approaches. Our findings suggest that LLM-based evaluation frameworks can align with expert assessments as useful tools for validating medical documentation in clinical settings.

Authors

  • Junhyuk Seo
    Department of Nursing, Samsung Medical Center.
  • Dasol Choi
    Yonsei University.
  • Wonchul Cha
    SAIHST, Sungkyunkwan University.
  • Taerim Kim
    Department of Emergency Medicine, Samsung Medical Center, Sungkyunkwan University School of Medicine, 81, Irwon-ro, Gangnam-gu, Seoul, 06351, Republic of Korea.