Implementing a Resource-Light and Low-Code Large Language Model System for Information Extraction from Mammography Reports: A Pilot Study.

Journal: Journal of imaging informatics in medicine
Published Date:

Abstract

Large language models (LLMs) have been successfully used for data extraction from free-text radiology reports. Most current studies were conducted with LLMs accessed via an application programming interface (API). We evaluated the feasibility of using open-source LLMs, deployed on limited local hardware resources for data extraction from free-text mammography reports, using a common data element (CDE)-based structure. Seventy-nine CDEs were defined by an interdisciplinary expert panel, reflecting real-world reporting practice. Sixty-one reports were classified by two independent researchers to establish ground truth. Five different open-source LLMs deployable on a single GPU were used for data extraction using the general-classifier Python package. Extractions were performed for five different prompt approaches with calculation of overall accuracy, micro-recall and micro-F1. Additional analyses were conducted using thresholds for the relative probability of classifications. High inter-rater agreement was observed between manual classifiers (Cohen's kappa 0.83). Using default prompts, the LLMs achieved accuracies of 59.2-72.9%. Chain-of-thought prompting yielded mixed results, while few-shot prompting led to decreased accuracy. Adaptation of the default prompts to precisely define classification tasks improved performance for all models, with accuracies of 64.7-85.3%. Setting certainty thresholds further improved accuracies to > 90% but reduced the coverage rate to < 50%. Locally deployed open-source LLMs can effectively extract information from mammography reports, maintaining compatibility with limited computational resources. Selection and evaluation of the model and prompting strategy are critical. Clear, task-specific instructions appear crucial for high performance. Using a CDE-based framework provides clear semantics and structure for the data extraction.

Authors

  • Fabio Dennstädt
    Department of Radiation Oncology, Kantonsspital St. Gallen, St. Gallen, Switzerland.
  • Simon Fauser
    Department of Radiation Oncology, Inselspital, Bern University Hospital and University of Bern, Bern, Switzerland.
  • Nikola Cihoric
    Department of Radiation Oncology, Inselspital, Bern University Hospital and University of Bern, Switzerland.
  • Max Schmerder
    Department of Radiation Oncology, Inselspital, Bern University Hospital and University of Bern, Bern, Switzerland.
  • Paolo Lombardo
    Department of Diagnostic, Interventional and Pediatric Radiology (DIPR), Inselspital, Bern University Hospital and University of Bern, Bern, Switzerland.
  • Grazia Maria Cereghetti
    Department of Diagnostic, Interventional and Pediatric Radiology (DIPR), Inselspital, Bern University Hospital and University of Bern, Bern, Switzerland.
  • Sandro von Däniken
    Department of Diagnostic, Interventional and Pediatric Radiology (DIPR), Inselspital, Bern University Hospital and University of Bern, Bern, Switzerland.
  • Thomas Minder
    Department of Diagnostic, Interventional and Pediatric Radiology (DIPR), Inselspital, Bern University Hospital and University of Bern, Bern, Switzerland.
  • Jaro Meyer
    Department of Diagnostic, Interventional and Pediatric Radiology (DIPR), Inselspital, Bern University Hospital and University of Bern, Bern, Switzerland.
  • Lawrence Chiang
    Department of Diagnostic, Interventional and Pediatric Radiology (DIPR), Inselspital, Bern University Hospital and University of Bern, Bern, Switzerland.
  • Roberto Gaio
    Department of Radiation Oncology, Inselspital, Bern University Hospital and University of Bern, Bern, Switzerland.
  • Luc Lerch
    Medical Image Analysis Group, ARTORG Centre for Biomedical Research, University of Bern, Bern, Switzerland.
  • Irina Filchenko
    Department of Neurology, Inselspital, Bern University Hospital and University of Bern, Bern, Switzerland.
  • Daniel Reichenpfader
    Bern University of Applied Sciences, Institute for Medical Informatics, Quellgasse 21, Biel/Bienne, 2502, Bern, Switzerland.
  • Kerstin Denecke
    Institute for Medical Informatics, Bern University of Applied Sciences, Bern, Switzerland.
  • Caslav Vojvodic
    Wemedoo AG, Steinhausen, Switzerland.
  • Igor Tatalovic
    Wemedoo AG, Steinhausen, Switzerland.
  • André Sander
    ID Information und Dokumentation im Gesundheitswesen GmbH & Co. KGaA, Berlin, Germany.
  • Janna Hastings
    Institute for Implementation Science in Health Care, Faculty of Medicine, University of Zurich, Zürich, Zurich, Switzerland.
  • Daniel M Aebersold
    Department of Radiation Oncology, Inselspital, Bern University Hospital, and University of Bern, Freiburgstrasse, 3010, Bern, Switzerland.
  • Hendrik von Tengg-Kobligk
    Department of Diagnostic, Interventional and Pediatric Radiology, University Hospital and University of Bern, Freiburgstrasse, CH-3010 Bern, Switzerland.
  • Knud Nairz
    Department of Diagnostic, Interventional and Pediatric Radiology (DIPR), Inselspital, Bern University Hospital and University of Bern, Bern, Switzerland.

Keywords

No keywords available for this article.