Enhancing Interpretability of Ocular Disease Diagnosis: A Zero-Shot Study of Multimodal Large Language Models.
Journal:
Studies in health technology and informatics
Published Date:
Aug 7, 2025
Abstract
Visual foundation models have advanced ocular disease diagnosis, yet providing interpretable explanations remains challenging. We evaluate multimodal LLMs for generating explanations of ocular diagnoses, combining Vision Transformer-derived saliency maps with clinical metadata. After finetuning RETFound for improved performance on the BRSET dataset (AUC-ROC 0.9664/0.8611 for diabetic retinopathy/glaucoma), we compared five LLMs through technical and clinical evaluations. GPT-o1 demonstrated superior performance across technical dimensions and clinical metrics (79.32% precision, 77.18% recall, 78.25% F1, 20.68% hallucination rate). Our findings highlight the importance of underlying diagnostic accuracy and advanced model architecture for generating reliable clinical explanations, suggesting opportunities for integrated verification mechanisms in future developments. The code and details can be found at: https://github.com/YatingPan/ocular-llm-explainability.