Comparative evaluation of large language models for patient education after total knee arthroplasty.

Journal: The Knee
Published Date:

Abstract

BACKGROUND: Artificial intelligence (AI) tools are increasingly used to support healthcare communication. Total knee arthroplasty (TKA) is a common orthopedic procedure, and many patients seek perioperative information online; however, the accuracy, clinical applicability, and readability of AI-generated responses remain unclear. This study compared responses generated by two large language models (ChatGPT-5, OpenAI; Gemini Advanced v2.5, Google) to frequently asked patient questions related to TKA. METHODS: A question pool was developed using Google Trends and major patient information portals. The 10 most frequently searched TKA-related questions were selected through expert review and submitted once to each model under standardized conditions. Responses were anonymized and independently evaluated by 10 board-certified orthopedic surgeons using a five-point Likert scale for medical accuracy and clinical applicability. Readability was assessed using six established indices. Paired comparisons were performed using the Wilcoxon signed-rank test, and inter-rater reliability was assessed using the intraclass correlation coefficient. RESULTS: Both models received moderate-to-high expert ratings, with no significant differences in accuracy or clinical applicability (all P > 0.05). Expert agreement varied across topics. Gemini Advanced generated slightly less complex text on several readability indices, whereas other measures were comparable. All responses fell within a secondary-school readability range. CONCLUSION: ChatGPT-5 and Gemini Advanced v2.5 demonstrated comparable performance in accuracy and clinical applicability for TKA-related patient questions. Although Gemini Advanced produced marginally simpler text, the differences were small and unlikely to be clinically meaningful. These tools should be used as supervised adjuncts rather than standalone sources of patient guidance.

Authors

Keywords

No keywords available for this article.