Multi-Scale Transformer Architecture for Accurate Medical Image Classification
Journal:
arXiv
Published Date:
Feb 10, 2025
Abstract
This study introduces an AI-driven skin lesion classification algorithm built
on an enhanced Transformer architecture, addressing the challenges of accuracy
and robustness in medical image analysis. By integrating a multi-scale feature
fusion mechanism and refining the self-attention process, the model effectively
extracts both global and local features, enhancing its ability to detect
lesions with ambiguous boundaries and intricate structures. Performance
evaluation on the ISIC 2017 dataset demonstrates that the improved Transformer
surpasses established AI models, including ResNet50, VGG19, ResNext, and Vision
Transformer, across key metrics such as accuracy, AUC, F1-Score, and Precision.
Grad-CAM visualizations further highlight the interpretability of the model,
showcasing strong alignment between the algorithm's focus areas and actual
lesion sites. This research underscores the transformative potential of
advanced AI models in medical imaging, paving the way for more accurate and
reliable diagnostic tools. Future work will explore the scalability of this
approach to broader medical imaging tasks and investigate the integration of
multimodal data to enhance AI-driven diagnostic frameworks for intelligent
healthcare.