VcaNet: Vision Transformer with fusion channel and spatial attention module for 3D brain tumor segmentation.
Journal:
Computers in biology and medicine
Published Date:
Jan 14, 2025
Abstract
Accurate segmentation of brain tumors from MRI scans is a critical task in medical image analysis, yet it remains challenging due to the complex and variable nature of tumor shapes and sizes. Traditional convolutional neural networks (CNNs), while effective for local feature extraction, struggle to capture long-range dependencies crucial for 3D medical image analysis. To address these limitations, this paper presents VcaNet, a novel architecture that integrates a Vision Transformer (ViT) with a fusion channel and spatial attention module (CBAM), aimed at enhancing 3D brain tumor segmentation. The encoder of VcaNet employs a 3D enhanced convolution (ENCO) module to capture local volumetric features, while a Vision Transformer and multi-scale feature fusion module are incorporated in the bottleneck to capture global dependencies. Additionally, a CBAM is introduced in the decoder to further improve the integration of local and global features, enhancing segmentation accuracy. Extensive experiments on the two public BraTS Datasets demonstrate that VcaNet outperforms existing models, particularly in handling the complex spatial structures of brain tumors. This approach provides valuable insights for improving brain tumor segmentation, and its performance in 3D tasks surpasses that of 2D models, laying a foundation for future advancements in medical imaging.