A Framework to Assess Clinical Safety and Hallucination Rates of LLMs for Medical Text Summarisation

Journal: medRxiv

Published Date: Jan 1, 2025

Abstract

Integrating large language models (LLMs) into healthcare settings can improve workflow efficiency and patient care by automating tasks such as summarising consultations. However, ensuring the fidelity between LLM outputs and ground truth information is crucial, as errors can lead to miscommunication between patients and clinicians, resulting in incorrect diagnoses, treatment decisions and compromised patient safety. We introduce a clinician-in-the-loop framework with: 1) a clinically and technically-informed error taxonomy to classify LLM outputs, 2) an experimental structure to comprehensively and iteratively compare outputs within our LLM document generation pipeline, 3) a clinical safety framework to assess potential harms of errors in LLM outputs, and 4) an encompassing graphical user interface (GUI), CREOLA, to perform and assess all previous steps. Our clinical error metrics were derived from 18 experimental configurations involving LLMs for clinical note generation consisting of 49,590 transcript and 12,999 clinical note sentences. Overall, we observed a 1.47% hallucination rate (44% rated ‘major’) and a 3.45% omission rate (17% ‘major’). Through iterative prompts and workflow refinements, we reduced major errors below previously reported human note-taking error rates, underscoring the potential of our framework to enable safer clinical documentation.

Authors

Elham Asgari; Saleh Khalil; Nina Montaña-Brown; Magda Dubois; Jasmine Balloch; Joshua Au Yeung; Dominic Pimenta

External Resources

View on medRxiv Access via DOI

A Framework to Assess Clinical Safety and Hallucination Rates of LLMs for Medical Text Summarisation

Abstract

Authors

Categories

External Resources

Popular Topics

Recent Journals

A Framework to Assess Clinical Safety and Hallucination Rates of LLMs for Medical Text Summarisation

Abstract

Authors

Categories

External Resources

Stay Ahead of Medical AI

Popular Topics

Recent Journals