Transformer attention fusion for fine grained medical image classification.

Journal: Scientific reports

Published Date: Jul 1, 2025

Abstract

Fine-grained visual classification is fundamental for medical image applications because it detects minor lesions. Diabetic retinopathy (DR) is a preventable cause of blindness, which requires exact and timely diagnosis to prevent vision damage. The challenges automated DR classification systems face include irregular lesions, uneven distributions between image classes, and inconsistent image quality that reduces diagnostic accuracy during early detection stages. Our solution to these problems includes MSCAS-Net (Multi-Scale Cross and Self-Attention Network), which uses the Swin Transformer as the backbone. It extracts features at three different resolutions (12 × 12, 24 × 24, 48 × 48), allowing it to detect subtle local features and global elements. This model uses self-attention mechanics to improve spatial connections between single scales and cross-attention to automatically match feature patterns across multiple scales, thereby developing a comprehensive information structure. The model becomes better at detecting significant lesions because of its dual mechanism, which focuses on both attention points. MSCAS-Net displays the best performance on APTOS and DDR and IDRID benchmarks by reaching accuracy levels of 93.8%, 89.80% and 86.70%, respectively. Through its algorithm, the model solves problems with imbalanced datasets and inconsistent image quality without needing data augmentation because it learns stable features. MSCAS-Net demonstrates a breakthrough in automated DR diagnostics since it combines high diagnostic precision with interpretable abilities to become an efficient AI-powered clinical decision support system. The presented research demonstrates how fine-grained visual classification methods benefit detecting and treating DR during its early stages.

Authors

Danyal Badar

College of Computer Science, Chongqing University, Chongqing, China.
Junaid Abbas

School of Big Data and Software Engineering, Chongqing University, Chongqing, China.
Raed Alsini

Department of Information Systems, Faculty of Computing and Information Technology, King Abdulaziz University, Jeddah, Saudi Arabia.
Tahir Abbas

Department of Industrial Design, Eindhoven University of Technology, 5600 MB Eindhoven, The Netherlands.
Wang ChengLiang

College of Computer Science, Chongqing University, Chongqing, China. wangcl@cqu.edu.cn.
Ali Daud

Faculty of Resilience, Rabdan Academy, Abu Dhabi, United Arab Emirates. alimsdb@gmail.com.

Keywords

Algorithms Diabetic Retinopathy Humans Image Interpretation, Computer-Assisted Image Processing, Computer-Assisted Neural Networks, Computer

External Resources

View on PubMed Access via DOI PubMed (40596233)

Transformer attention fusion for fine grained medical image classification.

Abstract

Authors

Keywords

External Resources

Popular Topics

Recent Journals

Transformer attention fusion for fine grained medical image classification.

Abstract

Authors

Keywords

External Resources

Stay Ahead of Medical AI

Popular Topics

Recent Journals