Reliability of Multimodal AI for Assessing Preclinical Stainless Steel Crown Preparations: A Comparative Study With Human Experts.
Journal:
International journal of paediatric dentistry
Published Date:
Nov 6, 2025
Abstract
BACKGROUND: Artificial intelligence presents the potential to enhance consistency and objectivity in preclinical pediatric dentistry assessments. AIM: To evaluate the reliability of multimodal artificial intelligence (AI) models (GPT-4o, Claude-3.7-Sonnet-Reasoning, o4-mini, DeepSeek-R1, DeepSeek-V3, and o3) compared to human experts in assessing stainless steel crown (SSC) preparations. DESIGN: This cross-sectional study analyzed 133 SSC preparations (27 mandibular first primary molars, 106 mandibular second primary molars) from dental students. Using a rubric assessing occlusal reduction, proximal reduction, and finishing criteria, five photographs were captured for each preparation. Images were analyzed using a Reflection-of-Thought prompt and compared to human assessments using a conventional p < 0.05 criterion. RESULTS: Claude-3.7-Sonnet-Reasoning demonstrated exceptional agreement with human experts (ICC = 0.89) across all preparations with consistent performance by tooth type. o4-mini showed moderate agreement (ICC = 0.57), GPT-4o weak agreement (ICC = 0.06), and o3 no agreement (ICC = -0.03), while DeepSeek models achieved 0% task completion. Error analysis revealed proximal reduction errors as the most common (39.2%), followed by finishing (33.6%) and occlusal reduction (27.1%) with significant variations in error detections between assessors, particularly for second primary molars. CONCLUSIONS: Claude-3.7-Sonnet Reasoning demonstrates human-expert-level reliability in assessing SSC preparations. AI models offer promising complementary approaches to standardize preclinical pediatric dentistry assessments, provide immediate feedback, and reduce faculty workload.
Authors
Keywords
No keywords available for this article.