MSV-Mamba: A Multiscale Vision Mamba Network for Echocardiography Segmentation
Journal:
arXiv
Published Date:
Jan 13, 2025
Abstract
Ultrasound imaging frequently encounters challenges, such as those related to
elevated noise levels, diminished spatiotemporal resolution, and the complexity
of anatomical structures. These factors significantly hinder the model's
ability to accurately capture and analyze structural relationships and dynamic
patterns across various regions of the heart. Mamba, an emerging model, is one
of the most cutting-edge approaches that is widely applied to diverse vision
and language tasks. To this end, this paper introduces a U-shaped deep learning
model incorporating a large-window Mamba scale (LMS) module and a hierarchical
feature fusion approach for echocardiographic segmentation. First, a cascaded
residual block serves as an encoder and is employed to incrementally extract
multiscale detailed features. Second, a large-window multiscale mamba module is
integrated into the decoder to capture global dependencies across regions and
enhance the segmentation capability for complex anatomical structures.
Furthermore, our model introduces auxiliary losses at each decoder layer and
employs a dual attention mechanism to fuse multilayer features both spatially
and across channels. This approach enhances segmentation performance and
accuracy in delineating complex anatomical structures. Finally, the
experimental results using the EchoNet-Dynamic and CAMUS datasets demonstrate
that the model outperforms other methods in terms of both accuracy and
robustness. For the segmentation of the left ventricular endocardium
(${LV}_{endo}$), the model achieved optimal values of 95.01 and 93.36,
respectively, while for the left ventricular epicardium (${LV}_{epi}$), values
of 87.35 and 87.80, respectively, were achieved. This represents an improvement
ranging between 0.54 and 1.11 compared with the best-performing model.