Alignment of artificial intelligence-generated responses with systematic reviews in implant prosthodontics.

Journal: The Journal of prosthetic dentistry
Published Date:

Abstract

STATEMENT OF PROBLEM: Large language model (LLM)-based artificial intelligence (AI) platforms have emerged as tools to support clinical decision-making in dentistry, but their alignment with high-level evidence from systematic reviews in implant prosthodontics remains unclear. PURPOSE: The purpose of this study was to evaluate the degree of alignment between responses generated by ChatGPT and Google Gemini and the conclusions of published systematic reviews in implant prosthodontics. MATERIAL AND METHODS: Systematic reviews published between 2023 and 2025 addressing clinical questions in implant prosthodontics were included, with their conclusions used as reference standards and operationalized as expected-answer statements. Methodological quality of the included reviews was assessed using Assessing the Methodological Quality of Systematic Reviews 2 (AMSTAR 2). Standardized population, intervention, comparison, outcome (PICO)-based questions were submitted to ChatGPT and Google Gemini using identical prompts and no prior context. Agreement between AI responses and review conclusions was scored on a 5-point Likert scale by 2 blinded evaluators, with interrater reliability assessed using weighted Cohen kappa. Platform comparisons used the Wilcoxon matched-pairs signed-rank test, and domain analyses used the Kruskal-Wallis test with Dunn post hoc comparisons (α=.05). RESULTS: Seventy-four systematic reviews were included and categorized into 5 prosthodontic domains. Both ChatGPT and Google Gemini showed high agreement across domains, with no significant differences between platforms or domains (P>.05). Interrater agreement was almost perfect (κ=0.88-0.97). Although agreement was similar, ChatGPT more often reported moderate certainty, whereas Google Gemini more frequently expressed high certainty. CONCLUSIONS: ChatGPT and Google Gemini showed high agreement with systematic review conclusions in implant prosthodontics. Differences in certainty expression highlighted the need for cautious interpretation and professional oversight.

Authors

Keywords

No keywords available for this article.