Assessing Completeness of Clinical Histories Accompanying Imaging Orders Using Adapted Open-Source and Closed-Source Large Language Models.

Journal: Radiology
PMID:

Abstract

Background Incomplete clinical histories are a well-known problem in radiology. Previous dedicated quality improvement efforts focusing on reproducible assessments of the completeness of free-text clinical histories have relied on tedious manual analysis. Purpose To adapt and evaluate open-source and closed-source large language models (LLMs) for their ability to automatically extract clinical history elements within imaging orders and to use the best-performing adapted open-source model to assess the completeness of a large sample of clinical histories as a benchmark for clinical practice. Materials and Methods This retrospective single-site study used previously extracted information accompanying CT, MRI, US, and radiography orders from August 2020 to May 2022 at an adult and pediatric emergency department of a 613-bed tertiary academic medical center. Two open-source (Llama 2-7B [Meta], Mistral-7B [Mistral AI]) and one closed-source (GPT-4 Turbo [OpenAI]) LLMs were adapted using prompt engineering, in-context learning, and fine-tuning (open-source only) to extract the elements "past medical history," "what," "when," "where," and "clinical concern" from clinical histories. Model performance, interreader agreement using Cohen κ (none to slight, 0.01-0.20; fair, 0.21-0.40; moderate, 0.41-0.60; substantial, 0.61-0.80; almost perfect, 0.81-1.00), and semantic similarity between the models and the adjudicated manual annotations of two board-certified radiologists with 16 and 3 years of postfellowship experience, respectively, were assessed using accuracy, Cohen κ, and BERTScore, an LLM metric that quantifies how well two pieces of text convey the same meaning; 95% CIs were also calculated. The best-performing open-source model was then used to assess completeness on a large dataset of unannotated clinical histories. Results A total of 50 186 clinical histories were included (794 training, 150 validation, 300 initial testing, 48 942 real-world application). Of the two open-source models, Mistral-7B outperformed Llama 2-7B in assessing completeness and was further fine-tuned. Both Mistral-7B and GPT-4 Turbo showed substantial overall agreement with radiologists (mean κ, 0.73 [95% CI: 0.67, 0.78] to 0.77 [95% CI: 0.71, 0.82]) and adjudicated annotations (mean BERTScore, 0.96 [95% CI: 0.96, 0.97] for both models; = .38). Mistral-7B also rivaled GPT-4 Turbo in performance (weighted overall mean accuracy, 91% [95% CI: 89, 93] vs 92% [95% CI: 90, 94]; = .31) despite being a smaller model. Using Mistral-7B, 26.2% (12 803 of 48 942) of unannotated clinical histories were found to contain all five elements. Conclusion An easily deployable fine-tuned open-source LLM (Mistral-7B), rivaling GPT-4 Turbo in performance, could effectively extract clinical history elements with substantial agreement with radiologists and produce a benchmark for completeness of a large sample of clinical histories. The model and code will be fully open-sourced. © RSNA, 2025

Authors

  • David B Larson
    Department of Radiology, Warren Alpert Medical School, Brown University, 593 Eddy St, Providence, RI 02903 (I.P.); Department of Diagnostic Imaging, Rhode Island Hospital, Providence, RI (I.P.); Visiana, Hørsholm, Denmark (H.H.T.); Department of Radiology, Stanford University, Palo Alto, Calif (S.S.H., D.B.L.); and Department of Radiology, Athinoula A. Martinos Center for Biomedical Imaging, Massachusetts General Hospital, Harvard Medical School, Boston, Mass (J.K.C.).
  • Arogya Koirala
    Department of Radiology, Stanford University School of Medicine, 453 Quarry Rd, MC 5659, Stanford, CA 94304.
  • Lina Y Cheuy
    Department of Radiology, Stanford University School of Medicine, 453 Quarry Rd, MC 5659, Stanford, CA 94304.
  • Magdalini Paschali
    Department of Radiology, Stanford University School of Medicine, Stanford, CA, USA.
  • Dave Van Veen
    Department of Electrical Engineering, Stanford University, Stanford, CA 94305, USA.
  • Hye Sun Na
    Department of Radiology, Stanford University School of Medicine, 453 Quarry Rd, MC 5659, Stanford, CA 94304.
  • Matthew B Petterson
    Department of Radiology, Stanford University School of Medicine, 453 Quarry Rd, MC 5659, Stanford, CA 94304.
  • Zhongnan Fang
    LVIS Corporation, Palo Alto, California.
  • Akshay S Chaudhari
    Department of Radiology, Stanford University, Stanford, California.