LLM-Based Medical Document Evaluation: Integrating Human Expert Insights.
Journal:
Studies in health technology and informatics
Published Date:
Aug 7, 2025
Abstract
Large Language Models (LLMs) show potential in medical document generation, but ensuring reliability requires extensive expert involvement, limiting clinical applications. To address this challenge, we developed an LLM-based evaluation framework with three progressive Chain of Thought (CoT) strategies: Qualitative (expert persona), Quantitative-qualitative (error analysis), and Insight-integrated (expert reasoning). This framework captures nuanced evaluation patterns while maintaining efficiency. When tested on 33 LLM-generated Emergency Department records across five criteria, our Insight-integrated approach demonstrated strong correlation with expert evaluations (r = 0.680, p < .001), outperforming both Qualitative (r = 0.524) and Quantitative-qualitative (r = 0.630) approaches. Our findings suggest that LLM-based evaluation frameworks can align with expert assessments as useful tools for validating medical documentation in clinical settings.