Multimodal Performance of GPT-4 in Complex Ophthalmology Cases.

Journal: Journal of personalized medicine

Published Date: Apr 21, 2025

Abstract

The integration of multimodal capabilities into GPT-4 represents a transformative leap for artificial intelligence in ophthalmology, yet its utility in scenarios requiring advanced reasoning remains underexplored. This study evaluates GPT-4's multimodal performance on open-ended diagnostic and next-step reasoning tasks in complex ophthalmology cases, comparing it against human expertise. : GPT-4 was assessed across three study arms: (1) text-based case details with figure descriptions, (2) cases with text and accompanying ophthalmic figures, and (3) cases with figures only (no figure descriptions). We compared GPT-4's diagnostic and next-step accuracy across arms and benchmarked its performance against three board-certified ophthalmologists. : GPT-4 achieved 38.4% (95% CI [33.9%, 43.1%]) diagnostic accuracy and 57.8% (95% CI [52.8%, 62.2%]) next-step accuracy when prompted with figures without descriptions. Diagnostic accuracy declined significantly compared to text-only prompts ( = 0.007), though the next-step performance was similar ( = 0.140). Adding figure descriptions restored diagnostic accuracy (49.3%) to near parity with text-only prompts ( = 0.684). Using figures without descriptions, GPT-4's diagnostic accuracy was comparable to two ophthalmologists ( = 0.30, = 0.41) but fell short of the highest-performing ophthalmologist ( = 0.0004). For next-step accuracy, GPT-4 was similar to one ophthalmologist ( = 0.22) but underperformed relative to the other two ( = 0.0015, = 0.0017). : GPT-4's diagnostic performance diminishes when relying solely on ophthalmic images without textual context, highlighting limitations in its current multimodal capabilities. Despite this, GPT-4 demonstrated comparable performance to at least one ophthalmologist on both diagnostic and next-step reasoning tasks, emphasizing its potential as an assistive tool. Future research should refine multimodal prompts and explore iterative or sequential prompting strategies to optimize AI-driven interpretation of complex ophthalmic datasets.

Authors

David Mikhail

Faculty of Medicine, University of Toronto, Toronto, ON, Canada.
Daniel Milad

Faculty of Medicine, University of Montreal, Montreal, QC, Canada; Department of Ophthalmology, Centre Hospitalier de l'Université de Montréal, Montreal, QC, Canada.
Fares Antaki

Department of Ophthalmology, University of Montreal, Montreal, QC H3T 1J4, Canada.
Jason Milad

Department of Software Engineering, University of Waterloo, Waterloo, ON N2L 3G1, Canada.
Andrew Farah

Faculty of Medicine, McGill University, Montreal, QC H3A 0G4, Canada.
Thomas Khairy

Faculty of Medicine, McGill University, Montreal, QC H3A 0G4, Canada.
Jonathan El-Khoury

Department of Ophthalmology, University of Montreal, Montreal, QC H3T 1J4, Canada.
Kenan Bachour

Department of Ophthalmology, University of Montreal, Montreal, QC H3T 1J4, Canada.
Andrei-Alexandru Szigiato

Department of Ophthalmology, Hôpital du Sacré-Coeur de Montréal, Montreal, QC H4J 1C5, Canada.
Taylor Nayman

Department of Ophthalmology, University of Montreal, Montreal, QC H3T 1J4, Canada.
Guillaume A Mullie

Department of Ophthalmology, University of Montreal, Montreal, QC H3T 1J4, Canada.
Renaud Duval

Department of Ophthalmology, University of Montreal, Montreal, QC H3T 1J4, Canada.

Keywords

No keywords available for this article.

External Resources

View on PubMed Access via DOI PubMed (40278339)

Multimodal Performance of GPT-4 in Complex Ophthalmology Cases.

Abstract

Authors

Keywords

External Resources

Popular Topics

Recent Journals