Externally Tested AI Models for Malignancy Classification of Lung Nodules at Chest CT: A Systematic Review and Meta-Analysis.
Journal:
Radiology. Artificial intelligence
Published Date:
Jun 3, 2026
Abstract
Purpose To evaluate the pooled diagnostic accuracy of externally tested AI models for malignancy classification of lung nodules on chest CT. Materials and Methods A systematic search of PubMed, Embase, Web of Science, CINAHL, and the Cochrane Library was performed in January 2025 to identify studies evaluating AI models for malignancy classification of lung nodules on chest CT using pathology and/or at least 2-year follow-up as reference standards. Risk of bias was assessed using QUADAS-2, and pooled sensitivity and specificity were estimated using bivariate random-effects models. Results Twenty-one studies including 7,454 nodules were analyzed, with lung cancer prevalence ranging from 5.7% (17/297) to 91.5% (214/234). All models were based on deep learning; 17 studies (81%) involved Asian populations, 15 (71%) used non-screening populations, 14 (67%) reported 2D or 3D CNN architectures, and eight (38%) specified predefined malignancy thresholds. High risk of bias was identified in five studies for patient selection and two for index testing. Pooled sensitivity was 88%, specificity 75%, positive likelihood ratio 3.55, negative likelihood ratio 0.16, area under the receiver operating characteristic curve 0.89, and diagnostic odds ratio 22.4. Heterogeneity was high (I2 > 90%). Model architecture was associated with specificity, with higher values in studies reporting 2D or 3D CNNs compared with those without reported architecture (82-83% vs 58%, P = .03; meta-regression P = .02); other subgroup analyses showed no evidence of differences. Conclusion Externally tested AI models demonstrated high sensitivity but moderate specificity for malignancy classification of lung nodules on chest CT, supporting a potential role in rule-out strategies. However, substantial heterogeneity, inconsistent reporting, and risk of bias limit interpretation. ©RSNA, 2026.
Authors
Keywords
No keywords available for this article.