ESR Essentials: common performance metrics in AI-practice recommendations by the European Society of Medical Imaging Informatics.

Journal: European radiology

Published Date: Aug 3, 2025

Abstract

This article provides radiologists with practical recommendations for evaluating AI performance in radiology, ensuring alignment with clinical goals and patient safety. It outlines key performance metrics, including overlap metrics for segmentation, test-based metrics (e.g., sensitivity, specificity, and area under the receiver operating characteristic curve), and outcome-based metrics (e.g., precision, negative predictive value, F1-score, Matthews correlation coefficient, and area under the precision-recall curve). Key recommendations emphasize local validation using independent datasets, selecting task-specific metrics, and considering deployment context to ensure real-world performance matches claimed efficacy. Common pitfalls, such as overreliance on a single metric, misinterpretation in low-prevalence settings, and failure to account for clinical workflow, are addressed with mitigation strategies. Additional guidance is provided on threshold selection, prevalence-adjusted evaluation, and AI-generated image quality assessment. This guide equips radiologists to critically evaluate both commercially available and in-house developed AI tools, ensuring their safe and effective integration into clinical practice. CLINICAL RELEVANCE STATEMENT: This review provides guidance on selecting and interpreting AI performance metrics in radiology, ensuring clinically meaningful evaluation and safe deployment of AI tools. By addressing common pitfalls and promoting standardized reporting, it supports radiologists in making informed decisions, ultimately improving diagnostic accuracy and patient outcomes. KEY POINTS: Radiologists must evaluate performance metrics as they reflect acceptable performance in specific datasets but do not guarantee clinical utility. Independent evaluation tailored to the clinical setting is essential. Performance metrics must align with the intended task of the AI application-segmentation, detection, or classification-and be selected based on domain knowledge and clinical context. Sensitivity, specificity, area under the ROC curve, and accuracy must be interpreted with prevalence-dependent metrics (e.g., precision, F1 score, and Matthew's correlation coefficient) calculated for the target population to ensure safe and effective clinical use.

Authors

Michail E Klontzas

Department of Medical Imaging, Heraklion University Hospital, Crete, 70110, Greece; Advanced Hybrid Imaging Systems, Institute of Computer Science, Foundation for Research and Technology (FORTH), N. Plastira 100, Vassilika Vouton 70013, Heraklion, Crete, Greece. Electronic address: miklontzas@ics.forth.gr.
Kevin B W Groot Lipman

Department of Radiology, The Netherlands Cancer Institute, Amsterdam, The Netherlands.
Tugba Akinci D' Antonoli

Institute of Radiology and Nuclear Medicine, Cantonal Hospital Baselland, Liestal, Switzerland.
Anna Andreychenko

Department of Physics and Engineering, University of Information Technology, Mechanics and Optics, St Petersburg, Russia.
Renato Cuocolo

Department of Medicine, Surgery and Dentistry, University of Salerno, Baronissi, Italy.
Matthias Dietzel

Department of Radiology, University Hospital Erlangen, Maximiliansplatz 3, 91054, Erlangen, Germany.
Salvatore Gitto

Dipartimento di Scienze Biomediche per la Salute, Università degli Studi di Milano, Milano, Italy. Electronic address: sal.gitto@gmail.com.
Henkjan Huisman

Department of Radiology and Nuclear Medicine, Radboud University Medical Center, Nijmegen, The Netherlands.
João Santinha

Champalimaud Research, Champalimaud Foundation, Lisbon, Portugal.
Federica Vernuccio

Dipartimento di Promozione della Salute, Materno-Infantile, di Medicina Interna e Specialistica e di Eccellenza "G. D'Alessandro" (ProMISE), Università di Palermo.
Jacob J Visser

Department of Radiology & Nuclear Medicine, Erasmus University Medical Center, Rotterdam, The Netherlands.
Merel Huisman

Department of Radiology, University Medical Center Utrecht, Utrecht, The Netherlands. merel.huisman1@gmail.com.

Keywords

No keywords available for this article.

External Resources

View on PubMed Access via DOI PubMed (40753524)

ESR Essentials: common performance metrics in AI-practice recommendations by the European Society of Medical Imaging Informatics.

Abstract

Authors

Keywords

External Resources

Popular Topics

Recent Journals

ESR Essentials: common performance metrics in AI-practice recommendations by the European Society of Medical Imaging Informatics.

Abstract

Authors

Keywords

External Resources

Don't Miss the Future of Medicine

Popular Topics

Recent Journals