Mamba Based Feature Extraction And Adaptive Multilevel Feature Fusion For 3D Tumor Segmentation From Multi-modal Medical Image
Journal:
arXiv
Published Date:
Apr 30, 2025
Abstract
Multi-modal 3D medical image segmentation aims to accurately identify tumor
regions across different modalities, facing challenges from variations in image
intensity and tumor morphology. Traditional convolutional neural network
(CNN)-based methods struggle with capturing global features, while
Transformers-based methods, despite effectively capturing global context,
encounter high computational costs in 3D medical image segmentation. The Mamba
model combines linear scalability with long-distance modeling, making it a
promising approach for visual representation learning. However, Mamba-based 3D
multi-modal segmentation still struggles to leverage modality-specific features
and fuse complementary information effectively. In this paper, we propose a
Mamba based feature extraction and adaptive multilevel feature fusion for 3D
tumor segmentation using multi-modal medical image. We first develop the
specific modality Mamba encoder to efficiently extract long-range relevant
features that represent anatomical and pathological structures present in each
modality. Moreover, we design an bi-level synergistic integration block that
dynamically merges multi-modal and multi-level complementary features by the
modality attention and channel attention learning. Lastly, the decoder combines
deep semantic information with fine-grained details to generate the tumor
segmentation map. Experimental results on medical image datasets (PET/CT and
MRI multi-sequence) show that our approach achieve competitive performance
compared to the state-of-the-art CNN, Transformer, and Mamba-based approaches.