Evaluating large language models in answering patient questions about eye removal surgeries.

Journal: Orbit (Amsterdam, Netherlands)

Published Date: Sep 30, 2025

Abstract

PURPOSE: To evaluate the performance of ChatGPT-4 and Gemini, two large language models (LLMs), in addressing frequently asked questions (FAQs) about eye removal surgeries. METHODS: A set of 24 FAQs related to enucleation and evisceration was identified through a Google search and categorized into preoperative, procedural, and postoperative topics. Each question was submitted three times to ChatGPT-4o and Gemini, and responses were evaluated for consistency, accuracy, appropriateness, and potential harm. Readability was assessed using Flesch Reading Ease and Flesch-Kincaid Grade Level scores. RESULTS: Gemini exhibited higher response consistency compared to ChatGPT (p = 0.043), while ChatGPT produced longer responses (mean length: 169.3 vs. 109.9 words; p < 0.001). Gemini's responses were more readable, with a higher Flesch Reading Ease score (39.0 vs. 31.3, p = 0.001) and lower Flesch-Kincaid Grade Level (11.6 vs. 14.0, p < 0.001). Both LLMs demonstrated comparable accuracy and low potential for harm, with 79.2% of Gemini responses and 77.1% of ChatGPT responses rated as completely correct. The sources cited by Gemini included academic institutions (91.7%) and medical practices (8.3%), while ChatGPT exclusively referenced academic sources. CONCLUSIONS: ChatGPT and Gemini showed comparable accuracy and low harm potential when addressing patient questions about eye removal surgeries. Gemini provided more consistent and readable responses, but both LLMs exceeded the recommended readability levels for patient education. These findings highlight the potential of LLMs to assist in patient communication and clinical education while underscoring the need for careful oversight in their implementation.

Authors

Niloufar Bineshfar

Department of Ophthalmology, Bascom Palmer Eye Institute, University of Miami Miller School of Medicine, Miami, Florida, USA.
Chloe Shields

Department of Ophthalmology, Bascom Palmer Eye Institute, University of Miami Miller School of Medicine, Miami, Florida, USA.
Natalia Davila

Department of Ophthalmology, Bascom Palmer Eye Institute, University of Miami Miller School of Medicine, Miami, Florida, USA.
Sugi Panneerselvam

Department of Ophthalmology, Bascom Palmer Eye Institute, University of Miami Miller School of Medicine, Miami, Florida, USA.
Tejus Pradeep

School of Medicine, Johns Hopkins University, Baltimore, MD, USA.
Marissa K Shoji

Division of Oculofacial Plastic and Reconstructive Surgery, Viterbi Family Department of Ophthalmology, UC San Diego Shiley Eye Institute, La Jolla, California, U.S.A.
Wendy W Lee

Department of Ophthalmology, Bascom Palmer Eye Institute, University of Miami Miller School of Medicine, Miami, Florida, USA.

Keywords

No keywords available for this article.

External Resources

View on PubMed Access via DOI PubMed (41026610)

Evaluating large language models in answering patient questions about eye removal surgeries.

Abstract

Authors

Keywords

External Resources

Popular Topics

Recent Journals

Evaluating large language models in answering patient questions about eye removal surgeries.

Abstract

Authors

Keywords

External Resources

Don't Miss the Future of Medicine

Popular Topics

Recent Journals