Diagnostic Accuracy of Deep Learning for Automated Detection of Spinal Degenerative Disease on MRI: A Systematic Review and Meta-Analysis.

Journal: Journal of imaging informatics in medicine
Published Date:

Abstract

This study aims to estimate the diagnostic accuracy of deep learning (DL) models for automated detection/classification of spinal degenerative disease (SDD) on spine MRI and explore clinically relevant heterogeneity. We searched Ovid MEDLINE, Ovid Embase and Web of Science (January 2010-5 December 2025) for diagnostic accuracy studies of DL applied to spine MRI with reconstructible 2 × 2 data (TP/FP/FN/TN). Risk of bias was assessed with QUADAS-2. Pooled sensitivity and specificity were synthesised using hierarchical bivariate/HSROC models with a prespecified arm-selection hierarchy. Prespecified subgroup/sensitivity analyses examined spinal region, severity threshold, validation type and target focus. Fourteen studies (2020-2025) were included from 2363 records. Sample sizes ranged from 29 to 2991. Overall pooled sensitivity was 0.94 (95% CI 0.89-0.97) and specificity 0.95 (0.90-0.97) (LR + 17.5; LR - 0.06). Stenosis-focused studies showed lower pooled sensitivity/specificity (0.88/0.92) than studies targeting broader degenerative changes (0.96/0.96). Excluding small studies (n ≤ 50) yielded similar estimates (sensitivity 0.95; specificity 0.95; 12 studies). No study was low risk across all QUADAS-2 domains; 9/14 had ≥ 1 high-risk domain. Deeks' test showed no evidence of small-study effects (p = 0.28). DL models show high pooled accuracy for SDD detection on MRI, but clinical readiness is constrained by risk of bias, predominantly retrospective single-centre designs, subjective reference standards and limited external validation; prospective multicentre evaluations with prespecified clinically meaningful thresholds are needed.

Authors

Keywords

No keywords available for this article.