VMKLA-UNet: vision Mamba with KAN linear attention U-Net.
Journal:
Scientific reports
Published Date:
Apr 17, 2025
Abstract
In the domain of medical image segmentation, while convolutional neural networks (CNNs) and Transformer-based architectures have attained notable success, they continue to face substantial challenges. CNNs are often limited in their ability to capture long-range dependencies, while Transformer models are frequently constrained by significant computational overhead. Recently, the Vision Mamba model, combined with KAN linear attention, has emerged as a highly promising alternative. In this study, we propose a novel model for medical image segmentation, termed VMKLA-UNet. The encoder of this architecture harnesses the VMamba framework, which employs a bidirectional state-space model for global visual context modeling and positional embedding, thus enabling efficient feature extraction and representation learning. For the decoder, we introduce the MKCSA architecture, which incorporates KAN linear attention-rooted in the Mamba framework-alongside a channel-spatial attention mechanism. KAN linear attention substantially mitigates computational complexity while enhancing the model's capacity to focus on salient regions of interest, thereby facilitating efficient global context comprehension. The channel attention mechanism dynamically modulates the importance of each feature channel, accentuating critical features and bolstering the model's ability to differentiate between various tissue types or lesion areas. Concurrently, the spatial attention mechanism refines the model's focus on key regions within the image, enhancing segmentation boundary accuracy and detail resolution. This synergistic integration of channel and spatial attention mechanisms augments the model's adaptability, leading to superior segmentation performance across diverse lesion types. Extensive experiments on public datasets, including Polyp, ISIC 2017, ISIC 2018, PH, and Synapse, demonstrate that VMKLA-UNet consistently achieves high segmentation accuracy and robustness, establishing it as a highly effective solution for medical image segmentation tasks.