ESR Essentials: common performance metrics in AI-practice recommendations by the European Society of Medical Imaging Informatics.

Journal: European radiology
Published Date:

Abstract

This article provides radiologists with practical recommendations for evaluating AI performance in radiology, ensuring alignment with clinical goals and patient safety. It outlines key performance metrics, including overlap metrics for segmentation, test-based metrics (e.g., sensitivity, specificity, and area under the receiver operating characteristic curve), and outcome-based metrics (e.g., precision, negative predictive value, F1-score, Matthews correlation coefficient, and area under the precision-recall curve). Key recommendations emphasize local validation using independent datasets, selecting task-specific metrics, and considering deployment context to ensure real-world performance matches claimed efficacy. Common pitfalls, such as overreliance on a single metric, misinterpretation in low-prevalence settings, and failure to account for clinical workflow, are addressed with mitigation strategies. Additional guidance is provided on threshold selection, prevalence-adjusted evaluation, and AI-generated image quality assessment. This guide equips radiologists to critically evaluate both commercially available and in-house developed AI tools, ensuring their safe and effective integration into clinical practice. CLINICAL RELEVANCE STATEMENT: This review provides guidance on selecting and interpreting AI performance metrics in radiology, ensuring clinically meaningful evaluation and safe deployment of AI tools. By addressing common pitfalls and promoting standardized reporting, it supports radiologists in making informed decisions, ultimately improving diagnostic accuracy and patient outcomes. KEY POINTS: Radiologists must evaluate performance metrics as they reflect acceptable performance in specific datasets but do not guarantee clinical utility. Independent evaluation tailored to the clinical setting is essential. Performance metrics must align with the intended task of the AI application-segmentation, detection, or classification-and be selected based on domain knowledge and clinical context. Sensitivity, specificity, area under the ROC curve, and accuracy must be interpreted with prevalence-dependent metrics (e.g., precision, F1 score, and Matthew's correlation coefficient) calculated for the target population to ensure safe and effective clinical use.

Authors

  • Michail E Klontzas
    Department of Medical Imaging, Heraklion University Hospital, Crete, 70110, Greece; Advanced Hybrid Imaging Systems, Institute of Computer Science, Foundation for Research and Technology (FORTH), N. Plastira 100, Vassilika Vouton 70013, Heraklion, Crete, Greece. Electronic address: miklontzas@ics.forth.gr.
  • Kevin B W Groot Lipman
    Department of Radiology, The Netherlands Cancer Institute, Amsterdam, The Netherlands.
  • Tugba Akinci D' Antonoli
    Institute of Radiology and Nuclear Medicine, Cantonal Hospital Baselland, Liestal, Switzerland.
  • Anna Andreychenko
    Department of Physics and Engineering, University of Information Technology, Mechanics and Optics, St Petersburg, Russia.
  • Renato Cuocolo
    Department of Medicine, Surgery and Dentistry, University of Salerno, Baronissi, Italy.
  • Matthias Dietzel
    Department of Radiology, University Hospital Erlangen, Maximiliansplatz 3, 91054, Erlangen, Germany.
  • Salvatore Gitto
    Dipartimento di Scienze Biomediche per la Salute, Università degli Studi di Milano, Milano, Italy. Electronic address: sal.gitto@gmail.com.
  • Henkjan Huisman
    Department of Radiology and Nuclear Medicine, Radboud University Medical Center, Nijmegen, The Netherlands.
  • João Santinha
    Champalimaud Research, Champalimaud Foundation, Lisbon, Portugal.
  • Federica Vernuccio
    Dipartimento di Promozione della Salute, Materno-Infantile, di Medicina Interna e Specialistica e di Eccellenza "G. D'Alessandro" (ProMISE), Università di Palermo.
  • Jacob J Visser
    Department of Radiology & Nuclear Medicine, Erasmus University Medical Center, Rotterdam, The Netherlands.
  • Merel Huisman
    Department of Radiology, University Medical Center Utrecht, Utrecht, The Netherlands. merel.huisman1@gmail.com.

Keywords

No keywords available for this article.