A multimodal visual-language foundation model for computational ophthalmology.

Journal: NPJ digital medicine

Published Date: Jun 21, 2025

Abstract

Early detection of eye diseases is vital for preventing vision loss. Existing ophthalmic artificial intelligence models focus on single modalities, overlooking multi-view information and struggling with rare diseases due to long-tail distributions. We propose EyeCLIP, a multimodal visual-language foundation model trained on 2.77 million ophthalmology images from 11 modalities with partial clinical text. Our novel pretraining strategy combines self-supervised reconstruction, multimodal image contrastive learning, and image-text contrastive learning to capture shared representations across modalities. EyeCLIP demonstrates robust performance across 14 benchmark datasets, excelling in disease classification, visual question answering, and cross-modal retrieval. It also exhibits strong few-shot and zero-shot capabilities, enabling accurate predictions in real-world, long-tail scenarios. EyeCLIP offers significant potential for detecting both ocular and systemic diseases, and bridging gaps in real-world clinical applications.

Authors

Danli Shi

State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-sen University, Guangzhou, China.
Weiyi Zhang

Key Laboratory of Horticultural Plant Biology (MOE), College of Horticulture and Forestry Sciences, Huazhong Agricultural University, Wuhan, 430070, China.
Jiancheng Yang

Department of Electronic Engineering, Shanghai Jiao Tong University, Shanghai, P.R. China.
Siyu Huang

Department of Surgery, University of Melbourne, Parkville, Victoria, Australia.
Xiaolan Chen

Jiangsu Agri-animal Husbandry Vocational College, Taizhou, 225300 China.
Pusheng Xu

School of Optometry, The Hong Kong Polytechnic University, Kowloon, Hong Kong SAR, China.
Kai Jin

Eye Center, The Second Affiliated Hospital, School of Medicine, Zhejiang University, Hangzhou, China.
Shan Lin

Key Laboratory of Bioorganic Synthesis of Zhejiang Province, College of Biotechnology and Bioengineering, Zhejiang University of Technology, Hangzhou 310014, China.
Jin Wei

Department of Ophthalmology, Shanghai General Hospital, Shanghai Jiao Tong University School of Medicine, National Clinical Research Center for Eye Diseases, Shanghai Key Laboratory of Ocular Fundus Diseases, Shanghai Engineering Center for Visual Science and Photomedicine, Shanghai Engineering Center for Precise Diagnosis and Treatment of Eye Diseases, No. 100 Haining Road, Shanghai 20080, PR China.
Mayinuer Yusufu

Centre for Eye Research Australia, Royal Victorian Eye and Ear Hospital, East Melbourne, Australia; Department of Surgery (Ophthalmology), The University of Melbourne, Melbourne, Australia.
Shunming Liu

Department of Ophthalmology, Guangdong Provincial People's Hospital (Guangdong Academy of Medical Sciences), Southern Medical University, Guangzhou, 510080, China.
Qing Zhang

Department of Respiratory Medicine, Affiliated Zhongshan Hospital of Dalian University, Dalian, China.
Zongyuan Ge

AIM for Health Lab, Faculty of IT, Monash University, Clayton, Victoria, Australia; Monash-Airdoc Research Lab, Faculty of IT, Monash University, Clayton, Victoria, Australia.
Xun Xu

BGI-Shenzhen, Shenzhen 518083, China.
Mingguang He

State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-sen University, Guangzhou 510060, China; Centre for Eye Research Australia; Departments of Ophthalmology and Surgery, University of Melbourne, Melbourne, Australia. Electronic address: mingguang.he@unimelb.edu.au.

Keywords

No keywords available for this article.

External Resources

View on PubMed Access via DOI PubMed (40542189)

A multimodal visual-language foundation model for computational ophthalmology.

Abstract

Authors

Keywords

External Resources

Popular Topics

Recent Journals

A multimodal visual-language foundation model for computational ophthalmology.

Abstract

Authors

Keywords

External Resources

Don't Miss the Future of Medicine

Popular Topics

Recent Journals