VeriFact: Verifying Facts in LLM-Generated Clinical Text with Electronic Health Records

Journal: arXiv

Published Date: Jan 28, 2025

Abstract

Methods to ensure factual accuracy of text generated by large language models (LLM) in clinical medicine are lacking. VeriFact is an artificial intelligence system that combines retrieval-augmented generation and LLM-as-a-Judge to verify whether LLM-generated text is factually supported by a patient's medical history based on their electronic health record (EHR). To evaluate this system, we introduce VeriFact-BHC, a new dataset that decomposes Brief Hospital Course narratives from discharge summaries into a set of simple statements with clinician annotations for whether each statement is supported by the patient's EHR clinical notes. Whereas highest agreement between clinicians was 88.5%, VeriFact achieves up to 92.7% agreement when compared to a denoised and adjudicated average human clinican ground truth, suggesting that VeriFact exceeds the average clinician's ability to fact-check text against a patient's medical record. VeriFact may accelerate the development of LLM-based EHR applications by removing current evaluation bottlenecks.

Authors

Philip Chung
Akshay Swaminathan
Alex J. Goodell
Yeasul Kim
S. Momsen Reincke
Lichy Han
Ben Deverett
Mohammad Amin Sadeghi
Abdel-Badih Ariss
Marc Ghanem
David Seong
Andrew A. Lee
Caitlin E. Coombes
Brad Bradshaw
Mahir A. Sufian
Hyo Jung Hong
Teresa P. Nguyen
Mohammad R. Rasouli
Komal Kamra
Mark A. Burbridge
James C. McAvoy
Roya Saffary
Stephen P. Ma
Dev Dash
James Xie
Ellen Y. Wang
Clifford A. Schmiesing
Nigam Shah
Nima Aghaeepour

External Resources

View on arXiv arXiv (http://arxiv.org/abs/2501.16672v1)

VeriFact: Verifying Facts in LLM-Generated Clinical Text with Electronic Health Records

Abstract

Authors

Categories

External Resources

Popular Topics

Recent Journals

VeriFact: Verifying Facts in LLM-Generated Clinical Text with Electronic Health Records

Abstract

Authors

Categories

External Resources

Stay Ahead of Medical AI

Popular Topics

Recent Journals