Cross-Institutional Evaluation of Large Language Models for Radiology Diagnosis Extraction: A Prompt-Engineering Perspective.

Journal: Journal of imaging informatics in medicine
Published Date:

Abstract

The rapid evolution of large language models (LLMs) offers promising opportunities for radiology report annotation, aiding in determining the presence of specific findings. This study evaluates the effectiveness of a human-optimized prompt in labeling radiology reports across multiple institutions using LLMs. Six distinct institutions collected 500 radiology reports: 100 in each of 5 categories. A standardized Python script was distributed to participating sites, allowing the use of one common locally executed LLM with a standard human-optimized prompt. The script executed the LLM's analysis for each report and compared predictions to reference labels provided by local investigators. Models' performance using accuracy was calculated, and results were aggregated centrally. The human-optimized prompt demonstrated high consistency across sites and pathologies. Preliminary analysis indicates significant agreement between the LLM's outputs and investigator-provided reference across multiple institutions. At one site, eight LLMs were systematically compared, with Llama 3.1 70b achieving the highest performance in accurately identifying the specified findings. Comparable performance with Llama 3.1 70b was observed at two additional centers, demonstrating the model's robust adaptability to variations in report structures and institutional practices. Our findings illustrate the potential of optimized prompt engineering in leveraging LLMs for cross-institutional radiology report labeling. This approach is straightforward while maintaining high accuracy and adaptability. Future work will explore model robustness to diverse report structures and further refine prompts to improve generalizability.

Authors

  • Mana Moassefi
    Mayo Clinic Artificial Intelligence Lab, Department of Radiology, Mayo Clinic, 200 1st Street, S.W., Rochester, MN, 55905, USA.
  • Sina Houshmand
    Department of Radiology, University of California San Francisco, San Francisco, CA, USA.
  • Shahriar Faghani
    Mayo Clinic Artificial Intelligence Lab, Department of Radiology, Mayo Clinic, 200 1st Street, S.W., Rochester, MN, 55905, USA.
  • Peter D Chang
    Department of Radiological Sciences and Center for Artificial Intelligence in Diagnostic Medicine, University of California Irvine, Orange, California.
  • Shawn H Sun
    Departments of Radiological Sciences and Computer Science, University of California, Irvine, CA, USA.
  • Bardia Khosravi
    Department of Radiology, Radiology Informatics Lab, Mayo Clinic, Rochester, MN 55905, United States.
  • Aakash G Triphati
    Moffitt Cancer Center, Tampa, FL, USA.
  • Ghulam Rasool
    Moffitt Cancer Center, Tampa, FL, USA.
  • Neil K Bhatia
    Department of Radiology, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA.
  • Les Folio
    Moffitt Cancer Center, Tampa, FL, USA.
  • Katherine P Andriole
    Department of Radiology, Brigham and Women's Hospital, Harvard Medical School, Boston, Mass (W.F.W., M.T.C., K.M., S.A.G., E.G., M.H.R., G.C.G., K.P.A.); and MGH & BWH Center for Clinical Data Science, Boston, Mass (W.F.W., M.T.C., K.M., K.P.A.).
  • Judy W Gichoya
    The Johns Hopkins Hospital, Department of Radiology, 601 N Caroline St, Room 4223, Baltimore, MD 21287 (S.K.); Cleveland Clinic, Department of Radiation Oncology, Cleveland, Ohio (H.E.); Emory University School of Medicine, Department of Radiology, Atlanta, Georgia (J.G.); University of Pennsylvania, Department of Radiology, Philadelphia, Pennsylvania (C.E.K.).
  • Bradley J Erickson
    Department of Radiology, Radiology Informatics Lab, Mayo Clinic, Rochester, MN 55905, United States.

Keywords

No keywords available for this article.