A comparative analysis of DeepSeek R1, DeepSeek-R1-Lite, OpenAi o1 Pro, and Grok 3 performance on ophthalmology board-style questions.

Journal: Scientific reports

Published Date: Jul 2, 2025

Abstract

The ability of large language models (LLMs) to accurately answer medical board-style questions reflects their potential to benefit medical education and real-time clinical decision-making. With the recent advance to reasoning models, the latest LLMs excel at addressing complex problems in benchmark math and science tests. This study assessed the performance of first-generation reasoning models-DeepSeek's R1 and R1-Lite, OpenAI's o1 Pro, and Grok 3-on 493 ophthalmology questions sourced from the StatPearls and EyeQuiz question banks. o1 Pro achieved the highest overall accuracy (83.4%), significantly outperforming DeepSeek R1 (72.5%), DeepSeek-R1-Lite (76.5%), and Grok 3 (69.2%) (p < 0.001 for all pairwise comparisons). o1 Pro also demonstrated superior performance in questions from eight of nine ophthalmologic subfields, questions of second and third order cognitive complexity, and on image-based questions. DeepSeek-R1-Lite performed the second best, despite relatively small memory requirements, while Grok 3 performed inferiorly overall. These findings demonstrate that the strong performance of the first-generation reasoning models extends beyond benchmark tests to high-complexity ophthalmology questions. While these findings suggest a potential role for reasoning models in medical education and clinical practice, further research is needed to understand their performance with real-world data, their integration into educational and clinical settings, and human-AI interactions.

Authors

Ryan Shean

ARUP Institute for Research and Innovation in Diagnostic and Precision Medicine, ARUP Laboratories, Salt Lake City, Utah, USA.
Tathya Shah

Keck School of Medicine, University of Southern California, 1975 Zonal Avenue, Los Angeles, CA, USA.
Aditya Pandiarajan

Information Sciences Institute, University of Southern California, 4676 Admiralty Way #1001, Marina Del Rey, CA, USA.
Alan Tang

Keck School of Medicine, University of Southern California, 1975 Zonal Avenue, Los Angeles, CA, USA.
Kyle Bolo

Department of Ophthalmology, Keck School of Medicine, Roski Eye Institute, University of Southern California, Los Angeles, California.
Van Nguyen

Department of Ophthalmology, Keck School of Medicine, Roski Eye Institute, University of Southern California, Los Angeles, California.
Benjamin Xu

Keck School of Medicine, Roski Eye Institute, University of Southern California, 1450 San Pablo Street, Los Angeles, CA, USA. benjamin.xu@med.usc.edu.

Keywords

Education, Medical Educational Measurement Humans Ophthalmology

External Resources

View on PubMed Access via DOI PubMed (40595291)

A comparative analysis of DeepSeek R1, DeepSeek-R1-Lite, OpenAi o1 Pro, and Grok 3 performance on ophthalmology board-style questions.

Abstract

Authors

Keywords

External Resources

Popular Topics

Recent Journals

A comparative analysis of DeepSeek R1, DeepSeek-R1-Lite, OpenAi o1 Pro, and Grok 3 performance on ophthalmology board-style questions.

Abstract

Authors

Keywords

External Resources

Stay Ahead of Medical AI

Popular Topics

Recent Journals