Guideline-based clinical reasoning in periodontology education: a comparative study of residents and large language models.

Journal: BMC medical education
Published Date:

Abstract

OBJECTIVE: In healthcare education, clinical practice guidelines play a central role in the development of clinical reasoning skills by providing structured, evidence-based decision-making frameworks. Successful management of peri-implantitis requires not only the acquisition of knowledge but also the ability to interpret and apply guideline recommendations within a clinical context. Large language models (LLMs), as artificial intelligence systems capable of generating clinically meaningful responses, have recently attracted attention; however, their educational performance when compared with learners at different levels of clinical training has not yet been sufficiently clarified. In this study, the performance of different LLMs was compared with that of periodontology assistants within a guideline-based clinical reasoning framework. METHODS: Based on the European Federation of Periodontology's (EFP) clinical practice guidelines for peri-implantitis, a total of 46 assessment items comprising multiple-choice questions (MCQs) and short-answer questions (CRQs) were developed. The questions were structured according to the five clinical stages of peri-implantitis management. Four LLMs (ChatGPT-5.1, Gemini 1.5 Flash, DeepSeek V3.2 and Claude Sonnet 4.5) and periodontology assistants at early, intermediate and advanced training levels were assessed using a standardised application protocol. In quantitative analyses, performance was compared across groups; in qualitative assessments, the clinical consistency, clarity, and clinical appropriateness of the explanatory responses were examined by academic staff. FINDINGS: In multiple-choice questions, generally similar performance was observed between residents and LLMs, suggesting comparable levels of success in recognising structured, guideline-based recommendations. In contrast, performance differences became more pronounced in open-ended questions requiring explanation, justification and the application of knowledge within a clinical context. Whilst a general difference in performance was observed across resident training levels in open-ended questions, no statistically significant differences were detected between specific resident groups in multiple comparison analyses. LLMs demonstrated stronger performance, particularly in tasks requiring the structured expression of guideline-based reasoning. Qualitative assessments also revealed differences between models in terms of explanation organisation and clinical consistency, particularly in tasks requiring higher levels of interpretation. CONCLUSION: Large language models can serve as valuable educational tools for organising and interpreting structured clinical information within guideline-based learning environments. However, the outputs of these systems should not be regarded as a direct substitute for real clinical reasoning, contextual decision-making, or experiential clinical judgement. Therefore, their educational integration must be conducted under the supervision of teaching staff and within a framework of critical evaluation. Hybrid educational models combining human-led clinical training with supervised AI-assisted learning can contribute to periodontology education whilst preserving the development of independent clinical reasoning skills.

Authors

Keywords

No keywords available for this article.