Comparative study of the performance of two versions of the same AI tool for pediatric skeletal maturation assessment.
Journal:
European radiology
Published Date:
Jan 7, 2026
Abstract
OBJECTIVES: To compare BoneXpert version 3 with version 2 in estimating bone age and bone health index (BHI), focusing on mean absolute error (MAE) and root mean square error (RMSE) in healthy children. MATERIALS AND METHODS: This retrospective study included 449 healthy children: 231 females aged 2.11-15.88 years (mean 8.84) and 218 males aged 3.09-15.94 years (mean 9.55). Bone age was assessed using both versions. Chronological age was recorded, and correlations between estimated and chronological ages were calculated (R2). Accuracy was evaluated using MAE and RMSE, with analyses stratified by sex and age group. Bone age standard deviation scores (SDS), the BHI, and their variability were compared between versions. RESULTS: Both versions showed strong correlations with chronological age (R2 = 0.93 for females, 0.91 for males in both). MAE was 0.89 years (95% CI: 0.07) for version 2 and 0.88 years (95% CI: 0.07) for version 3 (p > 0.05). RMSE increased with age and was higher in males. Overall RMSE was 1.15 (95% CI: 0.04) in version 2 and 1.12 (95% CI: 0.04) in version 3. Bone age SDS was higher with version 3 (mean 0.56) than with version 2 (mean 0.19) and more variable (SDS 1.51 vs. 1.29). Version 3 also provided SDS in the youngest age group. No significant differences were observed in the BHI or its SDS. CONCLUSION: Both BoneXpert versions are effective for bone age assessment in healthy children, with similar accuracy. Version 3 produces higher and more variable bone age SDS values and extends SDS reporting to younger ages. KEY POINTS: Question The reliability of the new version of the AI-supported program for bone age estimation in children needs to be evaluated. Findings In a sample of 449 healthy children, version 3 yields higher and more variable bone age SDS values and expands SDS availability to younger ages. Clinical relevance This comparative study draws the reader's attention to the fact that the modifications applied to the AI-supported program for bone age estimation are not always available to the users, and that its reliability and accuracy need to be validated.
Authors
Keywords
No keywords available for this article.