Large language models provide discordant information compared to ophthalmology guidelines.

Journal: Scientific reports

Published Date: Jul 1, 2025

Abstract

To evaluate the agreement of LLMs with the Preferred Practice Patterns (PPP) guidelines developed by the American Academy of Ophthalmology (AAO). Open questions based on the AAO PPP were submitted to five LLMs: GPT-o1 and GPT-4o by OpenAI, Claude 3.5 Sonnet by Anthropic, Gemini 1.5 Pro by Google, and DeepSeek-R1-Lite-Preview. Questions were classified as "open" or "confirmatory with positive/negative ground-truth answer". Three blinded investigators classified responses as "concordant", "undetermined", or "discordant" compared to the AAO PPP. Undetermined and discordant answers were analyzed to assess harming potential for patients. Responses referencing peer-reviewed articles were reported. In total, 147 questions were submitted to the LLMs. Concordant answers were 135 (91.8%) for GPT-o1, 133 (90.5%) for GPT-4o, 136 (92.5%) for Claude 3.5 Sonnet, 124 (84.4%) for Gemini 1.5 Pro, and 119 (81.0%) for DeepSeek-R1-Lite-Preview (P = 0.006). The highest number of harmful answers was reported for Gemini 1.5 Pro (n = 6, 4.1%), followed by DeepSeek-R1-Lite-Preview (n = 5, 3.4%). Gemini 1.5 Pro was the most transparent model (86 references, 58.5%). Other LLMs referenced papers in 9.5-15.6% of their responses. LLMs can provide discordant answers compared to ophthalmology guidelines, potentially harming patients by delaying diagnosis or recommending suboptimal treatments.

Authors

Andrea Taloni

Department of Translational Medicine, University of Ferrara, Ferrara, Italy.
Antonia Carmen Sangregorio

Department of Ophthalmology, University Magna Graecia of Catanzaro, Catanzaro, Italy.
Giuseppe Alessio

Department of Ophthalmology, University Magna Graecia of Catanzaro, Catanzaro, Italy.
Maria Angela Romeo

Department of Ophthalmology, University Magna Graecia of Catanzaro, Catanzaro, Italy.
Giulia Coco

Department of Clinical Sciences and Translational Medicine, University of Rome Tor Vergata, Rome, Italy.
Linda Marie Louise Busin

Department of Ophthalmology, Ospedali Privati Forlì "Villa Igea", Forlì, Italy.
Andrea Sollazzo

Department of Translational Medicine, University of Ferrara, Ferrara, Italy.
Vincenzo Scorcia

Department of Ophthalmology, "Magna Graecia" University of Catanzaro, Catanzaro, Italy.
Giuseppe Giannaccare

Eye Clinic, Department of Surgical Sciences, University of Cagliari, Cagliari, Italy.

Keywords

Humans Language Large Language Models Ophthalmology Practice Guidelines as Topic

External Resources

View on PubMed Access via DOI PubMed (40596239)

Large language models provide discordant information compared to ophthalmology guidelines.

Abstract

Authors

Keywords

External Resources

Popular Topics

Recent Journals

Large language models provide discordant information compared to ophthalmology guidelines.

Abstract

Authors

Keywords

External Resources

Stay Ahead of Medical AI

Popular Topics

Recent Journals