MSA-MaxNet: Multi-Scale Attention Enhanced Multi-Axis Vision Transformer Network for Medical Image Segmentation.

Journal: Journal of cellular and molecular medicine

Published Date: Dec 1, 2024

Abstract

Convolutional neural networks (CNNs) are well established in handling local features in visual tasks; yet, they falter in managing complex spatial relationships and long-range dependencies that are crucial for medical image segmentation, particularly in identifying pathological changes. While vision transformer (ViT) excels in addressing long-range dependencies, their ability to leverage local features remains inadequate. Recent ViT variants have merged CNNs to improve feature representation and segmentation outcomes, yet challenges with limited receptive fields and precise feature representation persist. In this work, we propose MSA-MaxNet. Specifically, our model utilises an encoder-decoder structure, using MaxViT blocks that apply multi-axis self-attention (Max-SA) as the encoder for local and global feature extraction. To restore the feature map's spatial resolution during upsampling operations, a symmetric MaxViT block-based decoder and upsampling layers are employed. To address the feature mismatches in the skip connections of UNet architecture, we introduce convolutional block attention module (CBAM). Furthermore, we design a multi-scale convolutional block attention module (MCBAM) based on CBAM, which utilises multi-scale features to enhance feature representation and refine the skip connection. We evaluate the segmentation performance of MSA-MaxNet on three publicly available medical imaging datasets, including Synapse for multi-organ segmentation, ACDC for cardiac analysis and Kvasir-SEG for gastrointestinal polyp detection. Notably, MSA-MaxNet achieves state-of-the-art (SOTA) Dice scores of 85.59% and 95.26% on Synapse and Kvasir-SEG datasets, respectively, with 40.28 M parameters. Additionally, we introduce two smaller versions of MSA-MaxNet to meet the demands of various scenarios. In summary, our work provides a robust framework for diverse medical imaging tasks, offering potential applications in early cancer detection, cardiovascular disease diagnosis and comprehensive organ-level assessments.

Authors

Wei Wu

Department of Pharmacy, The First Affiliated Hospital, Fujian Medical University, Fuzhou, China.
Junfeng Huang

Guangzhou Institute of Respiratory Health, State Key Laboratory of Respiratory Disease, National Clinical Research Center for Respiratory Disease, National Center for Respiratory Medicine, the First Affiliated Hospital of Guangzhou Medical University, Guangzhou, China.
Mingxuan Zhang

New York Genome Center, New York, NY, USA.
Yichen Li

School of Biomedical Engineering, Capital Medical University, No.10, Xitoutiao, You An Men, Fengtai District, Beijing 100069, China; Beijing Key Laboratory of Fundamental Research on Biomechanics in Clinical Application, Capital Medical University, No.10, Xitoutiao, You An Men, Fengtai District, Beijing 100069, China.
Qijia Yu

School of Computer Science and Software Engineering, University of Science and Technology Liaoning, Anshan, China.
Qi Zhao

Keywords

Algorithms Diagnostic Imaging Humans Image Processing, Computer-Assisted Neural Networks, Computer

External Resources

View on PubMed Access via DOI PubMed (39706821)

MSA-MaxNet: Multi-Scale Attention Enhanced Multi-Axis Vision Transformer Network for Medical Image Segmentation.

Abstract

Authors

Keywords

External Resources

Popular Topics

Recent Journals