Convolutional neural networks in paediatric fracture detection: pooled evidence from a systematic review and meta-analysis.

Journal: European radiology
Published Date:

Abstract

OBJECTIVE: The objective of this review was to systematically evaluate the diagnostic accuracy of artificial intelligence (AI) models for detecting paediatric appendicular fractures on plain radiographs. MATERIALS AND METHODS: This review followed the PRISMA-DTA guidelines. MEDLINE, Scopus, Cochrane Library, and Web of Science were searched from inception to May 2025. Eligible studies included paediatric patients (< 21 years) where AI models assessed plain radiographs for fractures, using human readers as the reference standard. Primary outcomes were pooled sensitivity, specificity, diagnostic odds ratio (DOR), positive likelihood ratio (LR+), and negative likelihood ratio (LR⁻). The risk of bias was assessed using QUADAS-2. Random-effects models and hierarchical summary receiver operating characteristic (HSROC) curves were applied. RESULTS: Seventeen studies met the inclusion criteria, with 11 contributing to the meta-analysis (over 10,000 radiographs). Pooled sensitivity was 0.92 (95% CI: 0.89-0.94), and specificity was 0.90 (95% CI: 0.85-0.94), corresponding to a false-positive rate of 0.10. The HSROC curve demonstrated high overall discriminative ability. Subgroup analyses showed comparable diagnostic performance for upper extremity fractures (sensitivity 0.91, specificity 0.89) and lower extremity fractures (sensitivity 0.89, specificity 0.94). The pooled DOR was 104.6, LR+ was 9.32, and LR⁻ was 0.089. Most studies had a low risk of bias, though many were retrospective and single-centre with limited external validation. CONCLUSION: AI models, particularly deep learning architectures, demonstrate high diagnostic accuracy for detecting paediatric appendicular fractures on radiographs, approaching expert-level performance and improving the diagnostic abilities of junior clinicians. However, broader clinical adoption requires robust external validation and prospective integration into clinical workflows. KEY POINTS: Question What is the diagnostic accuracy of artificial intelligence models for detecting paediatric appendicular fractures on plain radiographs? Findings AI models showed high diagnostic accuracy for paediatric appendicular fractures, with a pooled sensitivity of 0.92, specificity of 0.90, strong HSROC performance, and consistent results across limb subgroups. Clinical relevance AI-assisted fracture detection may improve diagnostic accuracy, support junior clinicians, and reduce delays in identifying paediatric appendicular fractures, enhancing patient safety and enabling faster, more efficient care pathways in emergency and outpatient settings.

Authors

Keywords

No keywords available for this article.