Identification of high-priority radiology reports with unexpected findings using fine-tuned large language models.

Journal: European radiology
Published Date:

Abstract

OBJECTIVE: This study aims to evaluate whether large language models (LLMs) can accurately predict the urgency and severity of radiology reports. MATERIALS AND METHODS: Based on the recommendations of the Academy of Royal Colleges, we defined radiology reports that include unexpected findings of high urgency or severity as "high-priority (HP) radiology reports." Overall, 1906 radiology reports were used as the training set, and 176 radiology reports were used as the test set, with a balanced ratio of HP to non-HP radiology reports (1:1) in both sets. Four types of LLMs (Llama2 7B, Llama3 8B, Llama3 Elyza 8B, and Llama 3.1 8B) were fine-tuned using four different input settings: (1) findings only, (2) findings + referring department, (3) findings + referring department + clinical diagnosis before examination, and (4) findings + referring department + clinical diagnosis before examination + details of examination request. The fine-tuned LLMs predicted whether each radiology report was HP or not. RESULTS: Among the four LLMs, Llama3 Elyza 8B, with inputs comprising findings and the referring department, demonstrated the best performance, achieving PRAUC = 0.962, ROCAUC = 0.968, accuracy = 0.915, sensitivity/recall = 0.932, specificity = 0.898, and F1 = 0.916. Adding a clinical diagnosis before the examination and details of examination requests did not necessarily lead to performance improvement. CONCLUSION: The fine-tuned LLMs accurately predicted HP radiology reports, suggesting their potential utility in supporting communication regarding radiology reports with high urgency or severity. KEY POINTS: Question This study aims to evaluate whether large language models (LLMs) can accurately predict the high-priority (HP) radiology reports. Findings The fine-tuned best LLM accurately HP radiology reports, achieving PRAUC of 0.962 and ROCAUC of 0.968. Clinical relevance This study demonstrates that fine-tuned LLMs can accurately identify HP radiology reports, potentially improving timely clinical decision-making and enhancing patient safety through faster communication of critical findings.

Authors

  • Akihiro Umeno
    Department of Radiology, Kobe University Graduate School of Medicine, Chuo-ku, Japan.
  • Mizuho Nishio
    Department of Diagnostic Imaging and Nuclear Medicine, Kyoto University Graduate School of Medicine, Kyoto, Japan.
  • Hidetoshi Matsuo
    Department of Radiology, Kobe University Graduate School of Medicine, Kobe, Japan. [email protected].
  • Takaaki Matsunaga
    Department of Radiology, Kobe University Graduate School of Medicine, Kobe, Japan (T.M., A.K., H.M., H.H., T.M.).
  • Munenobu Nogami
    Department of Radiology, Kobe University School of Medicine, 7-5-2 Kusunoki-cho, Chuo-ku, Kobe, Hyogo, 650-0017, Japan.
  • Eisuke Ueshima
    Department of Radiology, Kobe University Graduate School of Medicine, 7-5-2, Kusunoki-cho, Chuo-ku, Kobe, 650-0017, Japan.
  • Keitaro Sofue
    Department of Radiology, Kobe University Graduate School of Medicine, 7-5-2 Kusunoki-cho, Chuo-ku, Kobe City, Hyogo 650-0017, Japan.
  • Takamichi Murakami
    Department of Radiology, Kobe University Graduate School of Medicine, 7-5-2 Kusunoki-cho, Chuo-ku, Kobe City, Hyogo 650-0017, Japan.

Keywords

No keywords available for this article.