Multi-Modal Brain Tumor Segmentation via 3D Multi-Scale Self-attention and Cross-attention
Journal:
arXiv
Published Date:
Apr 12, 2025
Abstract
Due to the success of CNN-based and Transformer-based models in various
computer vision tasks, recent works study the applicability of CNN-Transformer
hybrid architecture models in 3D multi-modality medical segmentation tasks.
Introducing Transformer brings long-range dependent information modeling
ability in 3D medical images to hybrid models via the self-attention mechanism.
However, these models usually employ fixed receptive fields of 3D volumetric
features within each self-attention layer, ignoring the multi-scale volumetric
lesion features. To address this issue, we propose a CNN-Transformer hybrid 3D
medical image segmentation model, named TMA-TransBTS, based on an
encoder-decoder structure. TMA-TransBTS realizes simultaneous extraction of
multi-scale 3D features and modeling of long-distance dependencies by
multi-scale division and aggregation of 3D tokens in a self-attention layer.
Furthermore, TMA-TransBTS proposes a 3D multi-scale cross-attention module to
establish a link between the encoder and the decoder for extracting rich volume
representations by exploiting the mutual attention mechanism of cross-attention
and multi-scale aggregation of 3D tokens. Extensive experimental results on
three public 3D medical segmentation datasets show that TMA-TransBTS achieves
higher averaged segmentation results than previous state-of-the-art CNN-based
3D methods and CNN-Transform hybrid 3D methods for the segmentation of 3D
multi-modality brain tumors.