Human versus artificial intelligence in oral pathology diagnosis: a comparative study of ChatGPT, Grok, and MANUS.
Journal:
Scientific reports
Published Date:
Feb 25, 2026
Abstract
Artificial intelligence (AI) integration in diagnostic medicine has advanced accuracy and efficiency, particularly in pathology. This study assessed the diagnostic performance of three large language models (LLMs)-ChatGPT (GPT-4-turbo), Grok (xAI), and MANUS-in interpreting histopathology slides of oral lesions. A comparative diagnostic study was conducted using 100 high-resolution slides representing diverse oral pathologies. Images were sourced from a validated textbook and reviewed by two board-certified oral pathologists who provided consensus diagnoses. Each slide was analysed twice by the three AI models using standardized prompts. Diagnostic accuracy, intra-model consistency, inter-model concordance, and agreement with human experts were evaluated using descriptive statistics, Cohen's kappa, McNemar's test, and chi-square analysis. All AI models demonstrated high diagnostic accuracy. In the second round, Grok achieved the highest accuracy (97%), followed by MANUS (96%) and ChatGPT (94%). ChatGPT showed the highest intra-model consistency (κ = 0.918), while MANUS and Grok displayed substantial agreement (κ = 0.790 and 0.740). Expert pathologists achieved 98% accuracy. Comparisons between AI models and human diagnoses showed moderate to substantial agreement, with MANUS most aligned with experts. Most misclassifications occurred in histologically ambiguous cases, with no significant differences between AI models. Multimodal LLMs demonstrated strong diagnostic capabilities, consistency, and alignment with expert reasoning in oral histopathology interpretation. Grok was the most accurate, ChatGPT the most consistent, and MANUS the most expert-aligned. These findings support AI integration into digital pathology for diagnostic support, education, and quality assurance, with further validation in clinical datasets recommended.
Authors
Keywords
No keywords available for this article.