VeriFact: Verifying Facts in LLM-Generated Clinical Text with Electronic Health Records
Journal:
arXiv
Published Date:
Jan 28, 2025
Abstract
Methods to ensure factual accuracy of text generated by large language models
(LLM) in clinical medicine are lacking. VeriFact is an artificial intelligence
system that combines retrieval-augmented generation and LLM-as-a-Judge to
verify whether LLM-generated text is factually supported by a patient's medical
history based on their electronic health record (EHR). To evaluate this system,
we introduce VeriFact-BHC, a new dataset that decomposes Brief Hospital Course
narratives from discharge summaries into a set of simple statements with
clinician annotations for whether each statement is supported by the patient's
EHR clinical notes. Whereas highest agreement between clinicians was 88.5%,
VeriFact achieves up to 92.7% agreement when compared to a denoised and
adjudicated average human clinican ground truth, suggesting that VeriFact
exceeds the average clinician's ability to fact-check text against a patient's
medical record. VeriFact may accelerate the development of LLM-based EHR
applications by removing current evaluation bottlenecks.