Assessment of the Accuracy of Modern Artificial Intelligence Chatbots in Responding to Endodontic Queries.

Journal: Australian endodontic journal : the journal of the Australian Society of Endodontology Inc
Published Date:

Abstract

This study aims to compare the accuracy of modern AI chatbots, including Gemini 1.5 Flash, Gemini 1.5 Pro, ChatGPT-3.5 and ChatGPT-4, in responding to endodontic questions and supporting clinicians. Forty yes/no questions covering 12 endodontic topics were formulated by three experts. Each question was presented to the AI models on the same day, with a new chat session initiated for each. The agreement between chatbot responses and expert consensus was assessed using Cohen's kappa test (p < 0.05). ChatGPT-3.5 demonstrated the highest accuracy (80%), followed by ChatGPT-4 (77.5%), Gemini 1.5 Pro (72.5%) and Gemini 1.5 Flash (60%). The agreement levels ranged from weak (ChatGPT models) to minimal (Gemini Flash). The findings indicate variability in chatbot performance, with ChatGPT models outperforming Gemini. However, reliance on AI-generated responses for clinical decision-making remains questionable. Future studies should incorporate more complex clinical scenarios and broader analytical approaches to enhance the assessment of AI chatbots in endodontics.

Authors

  • Melis Çakar
    Department of Endodontics, Faculty of Dentistry, Erciyes University, Kayseri, Türkiye.
  • Ayşe Tuğba Eminsoy Avcı
    Department of Endodontics, Faculty of Dentistry, Erciyes University, Kayseri, Türkiye.
  • Salih Düzgün
    Department of Endodontics, Faculty of Dentistry, Erciyes University, Kayseri, Türkiye.
  • Tuğrul Aslan
    Department of Endodontics, Faculty of Dentistry, Erciyes University, Kayseri, Türkiye.
  • Kübra Nur Hekimoğlu
    Department of Endodontics, Faculty of Dentistry, Erciyes University, Kayseri, Türkiye.

Keywords

No keywords available for this article.