Enhancing Interpretability of Ocular Disease Diagnosis: A Zero-Shot Study of Multimodal Large Language Models.

Journal: Studies in health technology and informatics

Published Date: Aug 7, 2025

Abstract

Visual foundation models have advanced ocular disease diagnosis, yet providing interpretable explanations remains challenging. We evaluate multimodal LLMs for generating explanations of ocular diagnoses, combining Vision Transformer-derived saliency maps with clinical metadata. After finetuning RETFound for improved performance on the BRSET dataset (AUC-ROC 0.9664/0.8611 for diabetic retinopathy/glaucoma), we compared five LLMs through technical and clinical evaluations. GPT-o1 demonstrated superior performance across technical dimensions and clinical metrics (79.32% precision, 77.18% recall, 78.25% F1, 20.68% hallucination rate). Our findings highlight the importance of underlying diagnostic accuracy and advanced model architecture for generating reliable clinical explanations, suggesting opportunities for integrated verification mechanisms in future developments. The code and details can be found at: https://github.com/YatingPan/ocular-llm-explainability.

Authors

Yating Pan

Department of Computational Linguistics, University of Zurich.
Janna Hastings

Institute for Implementation Science in Health Care, Faculty of Medicine, University of Zurich, Zürich, Zurich, Switzerland.

Keywords

Diagnosis, Computer-Assisted Eye Diseases Humans Large Language Models Natural Language Processing

External Resources

View on PubMed Access via DOI PubMed (40775928)

Enhancing Interpretability of Ocular Disease Diagnosis: A Zero-Shot Study of Multimodal Large Language Models.

Abstract

Authors

Keywords

External Resources

Popular Topics

Recent Journals

Enhancing Interpretability of Ocular Disease Diagnosis: A Zero-Shot Study of Multimodal Large Language Models.

Abstract

Authors

Keywords

External Resources

Stay Ahead of Medical AI

Popular Topics

Recent Journals