A medically grounded LLM agent–based tool to detect patient safety events in medical records

Journal: medRxiv
Published Date:

Abstract

Large language models (LLMs) have shown incredible promise in medicine. While LLMs may be particularly useful in areas requiring extensive review of clinical records, their use remains limited due to their tendency to hallucinate and fabricate information. Hallucination issues, as well as their consequences, are exacerbated in low–probability, high–stakes scenarios such as rare adverse safety events or medical errors. We present SAFE–AI (Structured and Automated Framework for Explainable AI), a novel method for clinical decision making that combines the strengths of clinical expert knowledge with LLMs in an ontology–driven model that minimizes hallucinations using strict rules. We test this method to identify medication errors in medical charts. We collected a sample of 18,402 lines of clinical information from 300 EMS clinical charts that were independently dually reviewed by two expert physicians for epinephrine adverse safety events (ASEs), with 96% inter-rater agreement. We tested SAFE–AI against these labels, achieving human–like performance in detecting epinephrine overdoses with 97.9% accuracy, and 91.6% accuracy in identifying delays in epinephrine administration, greatly outperforming baseline LLMs models. Notably, some disagreements between clinicians and the model were found to be justifiable differences in judgment rather than errors. SAFE-AI presents a novel approach for clinical AI applications that addresses two key limitations of current machine learning methods: 1) over-reliance on probabilistic pattern recognition instead of established medical knowledge, and 2) perpetuation of biases present in training data. This framework is easily adaptable to a range of clinical applications, paving the way for provable and trustworthy AI in medicine. LLMs have shown promise in analyzing clinical records but their use is limited due to their tendency to hallucinate and fabricate information. Misinformation could threaten patient safety and jeopardize trust. We developed SAFE–AI (Structured and Automated Framework for Explainable AI), which combines knowledge from clinical experts with LLM inference to detect adverse safety events (ASEs) with minimal errors. We tested our method in identifying medication errors in medical charts and compared results to reviews by expert physicians. Our method detected epinephrine delays and overdoses with a high level of accuracy. SAFE-AI presents a novel approach for clinical AI applications that overcomes reliance on pattern recognition instead of medical knowledge biases present in training data.

Authors

  • Diego Trujillo; Dulin Wang; Nathan Bahr; Tina Yi-Jin Hsieh; Byeongyeon Cho; Garth Meckler; Matthew Hansen; Carl Eriksson; Kyu Seo Kim; Steven Bedrick; Xiaoqian Jiang; Jeanne-Marie Guise