Evaluating Large Language Model-Generated Clinical Summaries Through a Dual-Perspective Framework: Retrospective Observational Study.
Journal:
JMIR AI
Published Date:
Feb 10, 2026
Abstract
Large language models (LLMs) are increasingly used by patients and families to interpret complex medical documentation, yet most evaluations focus only on clinician-judged accuracy. In this study, 50 pediatric cardiac intensive care unit notes were summarized using GPT-4o mini and reviewed by both physicians and parents, who rated readability, clinical fidelity, and helpfulness. There were important discrepancies between parents and clinicians in the realm of helpfulness, along with important insights by clinicians assessing clinical accuracy and parents assessing readability. This study highlights the need for dual-perspective frameworks that balance clinical precision with patient understanding.
Authors
Keywords
No keywords available for this article.