Performance of large language models (ChatGPT4-0, Grok2 and Gemini) in UK dentistry and dental hygiene and therapy assessments.

Journal: British dental journal

Published Date: Jun 20, 2025

Abstract

Background Artificial intelligence, particularly large language models (LLMs), have demonstrated capabilities in performing complex natural language processing tasks. This study evaluated performance of LLMs in answering dentistry assessment questions and their ability to generate examination questions suitable for undergraduate dentistry.Methods GPT-4o, Grok2, and Gemini were tested on 340 multiple choice questions (MCQs), 80 short-answer question papers (SAPs), and three structured oral examinations within the Bachelor of Dental Surgery (BDS) and dental hygiene and therapy (DHT) programmes. Additionally, the LLMs also generated 140 assessment questions.Results In the BDS cohort, no significant differences were observed between LLMs in MCQ performance (p = 0.71) or SAP performance (p = 0.07). In the DHT cohort, significant differences were noted in SAP performance (p = 0.04), with GPT-4o and Grok2 outperforming Gemini (p = 0.01 and p <0.001, respectively). Question generation revealed that while LLMs produced appropriately worded questions in most categories, issues such as double negatives, lengthy narratives and inaccurate information emerged, particularly in hard topics and in constructing mark schemes.Conclusion GPT-4o and Grok2 demonstrated potential in answering dental assessment questions. All LLMs showed limitations in generating high-quality examination content.

Authors

Manas Dave

Lecturer in Dental Education, University of Manchester, Manchester, United Kingdom. manas.dave@manchester.ac.uk.
Rajpal Tattar

School of Clinical Dentistry, The University of Sheffield, UK.
Rasha Alafaleg

Division of Dentistry, The University of Manchester, UK; Department of Dental Education, College of Dentistry, Qassim University, Malida, Qassim, Saudi Arabia.
Siobhan Barry

Division of Dentistry, The University of Manchester, UK.
Senathirajah Ariyaratnam

Division of Dentistry, The University of Manchester, UK.
Reza Vahid Roudsari

Division of Dentistry, The University of Manchester, UK.
Neil Patel

Consultant and Senior Lecturer in Oral Surgery, University of Manchester, Manchester, United Kingdom.

Keywords

No keywords available for this article.

External Resources

View on PubMed Access via DOI PubMed (40542155)

Performance of large language models (ChatGPT4-0, Grok2 and Gemini) in UK dentistry and dental hygiene and therapy assessments.

Abstract

Authors

Keywords

External Resources

Popular Topics

Recent Journals

Performance of large language models (ChatGPT4-0, Grok2 and Gemini) in UK dentistry and dental hygiene and therapy assessments.

Abstract

Authors

Keywords

External Resources

Stay Ahead of Medical AI

Popular Topics

Recent Journals