Benchmarking Ophthalmology Foundation Models for Clinically Significant Age Macular Degeneration Detection
Journal:
arXiv
Published Date:
May 8, 2025
Abstract
Self-supervised learning (SSL) has enabled Vision Transformers (ViTs) to
learn robust representations from large-scale natural image datasets, enhancing
their generalization across domains. In retinal imaging, foundation models
pretrained on either natural or ophthalmic data have shown promise, but the
benefits of in-domain pretraining remain uncertain. To investigate this, we
benchmark six SSL-pretrained ViTs on seven digital fundus image (DFI) datasets
totaling 70,000 expert-annotated images for the task of moderate-to-late
age-related macular degeneration (AMD) identification. Our results show that
iBOT pretrained on natural images achieves the highest out-of-distribution
generalization, with AUROCs of 0.80-0.97, outperforming domain-specific models,
which achieved AUROCs of 0.78-0.96 and a baseline ViT-L with no pretraining,
which achieved AUROCs of 0.68-0.91. These findings highlight the value of
foundation models in improving AMD identification and challenge the assumption
that in-domain pretraining is necessary. Furthermore, we release BRAMD, an
open-access dataset (n=587) of DFIs with AMD labels from Brazil.