EffiViT: Hybrid CNN-Transformer for Retinal Imaging.
Journal:
Computers in biology and medicine
PMID:
40249994
Abstract
The human eye is a vital sensory organ that is crucial for visual perception. The retina is the main component of the eye and is responsible for visual signals. Due to its characteristics, the retina can reveal the occurrence of ocular diseases. So, early detection and automated diagnosis of retinal disease are crucial for preventing both temporary and permanent blindness. In the proposed work, a comprehensive framework is introduced, meticulously designed to leverage the synergic strengths of EfficientNet-B4 and Vision Transformers for attention-driven sophisticated analysis, offering a promising tool for advanced ophthalmic healthcare. This framework transcends the conventional hybridization by embedding the EfficientNetB4 reimagined as the multiscale feature encoder, creating discriminative feature maps preserving both local and intermediate contextual information. Then, Vision Transformer are incorporated to capitalize on the attention mechanisms to capture and model the global dependencies effectively. This combination establishes a sophisticated paradigm for capturing intricate patterns, focusing on the pertinent factors of the image, enabling precise and reliable classification. It is seen that the proposed model achieved a significant advancement by scoring an AUC of 0.9466, mAP of 0.7865, F1-score of 0.75 and Model Score of 0.8665. The framework achieved a remarkable 5.17% increase in the overall score when compared to the previous cutting-edge technologies on the same task. This improvement underscores the effectiveness of the hybrid model in identifying both local and global contextual information, making it a robust and reliable tool for precise retinal diagnosis.