The Performance of Artificial Intelligence in Providing Real-Time Aid in Emergency Dental Trauma: A Clinical Validation Study.

Journal: Dental traumatology : official publication of International Association for Dental Traumatology
Published Date:

Abstract

BACKGROUND: Searching online for dental emergency treatment as a non-expert can lead to unreliable guidance. We tested the publicly available first multimodal large-language model, ChatGPT-4o, prospectively with real emergency-department avulsion cases to determine if it would deliver guideline-correct, time-critical directions within seconds. METHODS: Seventy-eight anonymized avulsion charts (42 permanent, 36 primary teeth; 39 dry, 39 moist; 40 immature roots) were rewritten as lay prompts. ChatGPT-4o created two single responses to each vignette, 14 days apart (156 responses). Three oral and maxillofacial surgeons (OMFS) scored diagnostic accuracy, immediate action, contraindication identification, and completeness. Three lay assessors scored clarity (0-15 composite rating). An additional time-critical safety flag required simultaneous accuracy in immediate action and contraindication advice. Statistical analysis was performed at a 95% confidence level. RESULTS: ChatGPT-4o demonstrated significant rates of accurate guidance. Inter-rater reproducibility was near perfect (ICC = 0.94; κ = 0.88-0.998). The median composite score was 13 (IQR 12-14); permanent dentition elevated the probability for perfect diagnostic, contraindication, and immediate-action scores (p ≤ 0.046), but extra-oral dry time lowered immediate-action (p = 0.003) and reduced completeness (p = 0.023). Root maturity had no effect. Clarity was rated at more than 93% in both sessions. The safety flag was present in 81% and 89% of cases (χ2 = 6.73, p = 0.009), with one in eight potentially unsafe situations. CONCLUSIONS: This first clinical validation of ChatGPT-4o demonstrates expert-level, reproducible triage for tooth avulsion and introduces the "time-critical safety" composite as a strict benchmark for emergency chatbots. There is still a need for guideline-linked retrieval before unsupervised deployment. Clinically, these findings show that while ChatGPT can offer quick and largely accurate advice, the remaining deficiencies highlight the risk of incomplete or unsafe guidance during emergencies.

Authors

  • Nadav Grinberg
    Department of Otolaryngology, Head and Neck Surgery and Maxillofacial Surgery, Tel-Aviv Sourasky Medical Center, 64239, Tel Aviv, Israel. [email protected].
  • Shimrit Arbel
    Senior Surgeon, Department of Oral and Maxillofacial Surgery, Tel-Aviv Sourasky Medical Center, Tel Aviv, Israel.
  • Yana Yarden Boyadjiev
    Department of Oral and Maxillofacial Surgery, Grey Faculty of Medicine, Tel-Aviv Sourasky Medical Center, Tel Aviv University, Tel Aviv, Israel.
  • Clariel Ianculovici
    Senior Surgeon, Department of Oral and Maxillofacial Surgery, Tel-Aviv Sourasky Medical Center, Tel Aviv, Israel.
  • Shlomi Kleinman
    Department Head, Department of Oral and Maxillofacial Surgery, Tel-Aviv Sourasky Medical Center, Tel Aviv, Israel.
  • Oren Peleg
    Senior Surgeon, Department of Oral and Maxillofacial Surgery, Tel-Aviv Sourasky Medical Center, Tel Aviv, Israel; Senior Surgeon, Department of Oral and Maxillofacial Surgery, Goldschleger School of Dental Medicine, Tel-Aviv University, Tel-Aviv, Israel.

Keywords

No keywords available for this article.