Fine-tuning Vision Language Models with Graph-based Knowledge for Explainable Medical Image Analysis

Journal: arXiv

Published Date: Mar 12, 2025

Abstract

Accurate staging of Diabetic Retinopathy (DR) is essential for guiding timely interventions and preventing vision loss. However, current staging models are hardly interpretable, and most public datasets contain no clinical reasoning or interpretation beyond image-level labels. In this paper, we present a novel method that integrates graph representation learning with vision-language models (VLMs) to deliver explainable DR diagnosis. Our approach leverages optical coherence tomography angiography (OCTA) images by constructing biologically informed graphs that encode key retinal vascular features such as vessel morphology and spatial connectivity. A graph neural network (GNN) then performs DR staging while integrated gradients highlight critical nodes and edges and their individual features that drive the classification decisions. We collect this graph-based knowledge which attributes the model's prediction to physiological structures and their characteristics. We then transform it into textual descriptions for VLMs. We perform instruction-tuning with these textual descriptions and the corresponding image to train a student VLM. This final agent can classify the disease and explain its decision in a human interpretable way solely based on a single image input. Experimental evaluations on both proprietary and public datasets demonstrate that our method not only improves classification accuracy but also offers more clinically interpretable results. An expert study further demonstrates that our method provides more accurate diagnostic explanations and paves the way for precise localization of pathologies in OCTA images.

Authors

Chenjun Li
Laurin Lux
Alexander H. Berger
Martin J. Menten
Mert R. Sabuncu
Johannes C. Paetzold

External Resources

View on arXiv arXiv (http://arxiv.org/abs/2503.09808v1)

Fine-tuning Vision Language Models with Graph-based Knowledge for Explainable Medical Image Analysis

Abstract

Authors

Categories

External Resources

Popular Topics

Recent Journals

Fine-tuning Vision Language Models with Graph-based Knowledge for Explainable Medical Image Analysis

Abstract

Authors

Categories

External Resources

Don't Miss the Future of Medicine

Popular Topics

Recent Journals