Vision Mamba and xLSTM-UNet for medical image segmentation.

Journal: Scientific reports
Published Date:

Abstract

Deep learning-based medical image segmentation methods are generally divided into convolutional neural networks (CNNs) and Transformer-based models. Traditional CNNs are limited by their receptive field, making it challenging to capture long-range dependencies. While Transformers excel at modeling global information, their high computational complexity restricts their practical application in clinical scenarios. To address these limitations, this study introduces VMAXL-UNet, a novel segmentation network that integrates Structured State Space Models (SSM) and lightweight LSTMs (xLSTM). The network incorporates Visual State Space (VSS) and ViL modules in the encoder to efficiently fuse local boundary details with global semantic context. The VSS module leverages SSM to capture long-range dependencies and extract critical features from distant regions. Meanwhile, the ViL module employs a gating mechanism to enhance the integration of local and global features, thereby improving segmentation accuracy and robustness. Experiments on datasets such as ISIC17, ISIC18, CVC-ClinicDB, and Kvasir demonstrate that VMAXL-UNet significantly outperforms traditional CNNs and Transformer-based models in capturing lesion boundaries and their distant correlations. These results highlight the model's superior performance and provide a promising approach for efficient segmentation in complex medical imaging scenarios.

Authors

  • Xin Zhong
    Pancreatic Surgery, West China Hospital, Sichuan University, Chengdu, Sichuan Province, China.
  • Gehao Lu
    School of Information Science and Engineering, Yunnan University, Yunnan, 650504, China. gehaolu@163.com.
  • Hao Li
    Department of Urology, Tongji Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, China.