Comparison of ChatGPT-4 and ChatGPT-4 Plus Responses to Retinopathy of Prematurity-Related Questions and Performance Evaluation in Different Languages.
Journal:
Journal of pediatric ophthalmology and strabismus
Published Date:
Feb 10, 2026
Abstract
PURPOSE: To compare the accuracy and performance of the ChatGPT-4 and ChatGPT-4 Plus (OpenAI) models in different languages in terms of questions about retinopathy of prematurity (ROP). METHODS: Within the scope of the study, 15 questions about ROP were asked of both models. The questions were asked in Turkish and English, and the answers were evaluated by clinicians at five levels: "incorrect," "inadequate," "adequate," "correct," and "very correct." The reliability of the responses was analyzed using the intraclass correlation coefficient (ICC), and quality scores were compared using the t-test. RESULTS: According to the ICC analysis, the experts scoring the responses showed high agreement (95%). The difference between the response accuracy scores of both models (ChatGPT-4: P = .704 and ChatGPT-4 Plus: P = .999) and languages (ChatGPT-4: P = .704 and Chat-GPT-4 Plus: P = .999) was not statistically significant. The performance of the models in Turkish and English was found to have similar accuracy levels. CONCLUSIONS: In this study, ChatGPT-4 and Chat-GPT-4 Plus models accurately responded to ROP-related questions, and language differences did not affect performance. It was concluded that artificial intelligence-based models have the potential to overcome multilingualism and language barriers in health care settings.
Authors
Keywords
No keywords available for this article.