EyecareGPT: Boosting Comprehensive Ophthalmology Understanding with Tailored Dataset, Benchmark and Model
Journal:
arXiv
Published Date:
Apr 18, 2025
Abstract
Medical Large Vision-Language Models (Med-LVLMs) demonstrate significant
potential in healthcare, but their reliance on general medical data and
coarse-grained global visual understanding limits them in intelligent
ophthalmic diagnosis. Currently, intelligent ophthalmic diagnosis faces three
major challenges: (i) Data. The lack of deeply annotated, high-quality,
multi-modal ophthalmic visual instruction data; (ii) Benchmark. The absence of
a comprehensive and systematic benchmark for evaluating diagnostic performance;
(iii) Model. The difficulty of adapting holistic visual architectures to
fine-grained, region-specific ophthalmic lesion identification. In this paper,
we propose the Eyecare Kit, which systematically tackles the aforementioned
three key challenges with the tailored dataset, benchmark and model: First, we
construct a multi-agent data engine with real-life ophthalmology data to
produce Eyecare-100K, a high-quality ophthalmic visual instruction dataset.
Subsequently, we design Eyecare-Bench, a benchmark that comprehensively
evaluates the overall performance of LVLMs on intelligent ophthalmic diagnosis
tasks across multiple dimensions. Finally, we develop the EyecareGPT, optimized
for fine-grained ophthalmic visual understanding thoroughly, which incorporates
an adaptive resolution mechanism and a layer-wise dense connector. Extensive
experimental results indicate that the EyecareGPT achieves state-of-the-art
performance in a range of ophthalmic tasks, underscoring its significant
potential for the advancement of open research in intelligent ophthalmic
diagnosis. Our project is available at https://github.com/DCDmllm/EyecareGPT.