Leveraging Vision Transformers in Multimodal Models for Retinal OCT Analysis.

Journal: Studies in health technology and informatics

Published Date: May 15, 2025

Abstract

Optical Coherence Tomography (OCT) has become an indispensable imaging modality in ophthalmology, providing high-resolution cross-sectional images of the retina. Accurate classification of OCT images is crucial for diagnosing retinal diseases such as Age-related Macular Degeneration (AMD) and Diabetic Macular Edema (DME). This study explores the efficacy of various deep learning models, including convolutional neural networks (CNNs) and Vision Transformers (ViTs), in classifying OCT images. We also investigate the impact of integrating metadata (patient age, sex, eye laterality, and year) into the classification process, even when a significant portion of metadata is missing. Our results demonstrate that multimodal models leveraging both image and metadata inputs, such as the Multimodal ResNet18, can achieve competitive performance compared to image-only models, such as DenseNet121. Notably, DenseNet121 and Multimodal ResNet18 achieved the highest accuracy of 95.16%, with DenseNet121 showing a slightly higher F1-score of 0.9313. The multimodal ViT-based model also demonstrated promising results, achieving an accuracy of 93.22%, indicating the potential of Vision Transformers (ViTs) in medical image analysis, especially for handling complex multimodal data.

Authors

Georgios Feretzakis

School of Science and Technology, Hellenic Open University, Patras, Greece.
Christina Karakosta

School of Medicine, National and Kapodistrian University of Athens, Athens, Greece.
Aris Gkoulalas-Divanis

IBM Watson Health, Cambridge, Massachusetts, USA.
Anastasios Bisoukis

Medical Retina Department, Bristol Eye Hospital, Bristol, UK.
Iris Zoe Boufeas

Barts and The London School of Medicine and Dentistry, Queen Mary University of London, UK.
Effrosyni Bazakidou

Medical School, Humanitas University, Milan, Italy.
Aikaterini Sakagianni

Sismanogleio General Hospital, Intensive Care Unit, Marousi, Greece.
Dimitris Kalles

School of Science and Technology, Hellenic Open University, Patras, Greece.
Vassilios S Verykios

School of Science and Technology, Hellenic Open University, Patras, Greece.

Keywords

Deep Learning Humans Image Interpretation, Computer-Assisted Macular Degeneration Neural Networks, Computer Retina Retinal Diseases Tomography, Optical Coherence

External Resources

View on PubMed Access via DOI PubMed (40380672)

Leveraging Vision Transformers in Multimodal Models for Retinal OCT Analysis.

Abstract

Authors

Keywords

External Resources

Popular Topics

Recent Journals