MedSAM-CA: A CNN-Augmented ViT with Attention-Enhanced Multi-Scale Fusion for Medical Image Segmentation
Journal:
arXiv
Published Date:
Jun 30, 2025
Abstract
Medical image segmentation plays a crucial role in clinical diagnosis and
treatment planning, where accurate boundary delineation is essential for
precise lesion localization, organ identification, and quantitative assessment.
In recent years, deep learning-based methods have significantly advanced
segmentation accuracy. However, two major challenges remain. First, the
performance of these methods heavily relies on large-scale annotated datasets,
which are often difficult to obtain in medical scenarios due to privacy
concerns and high annotation costs. Second, clinically challenging scenarios,
such as low contrast in certain imaging modalities and blurry lesion boundaries
caused by malignancy, still pose obstacles to precise segmentation. To address
these challenges, we propose MedSAM-CA, an architecture-level fine-tuning
approach that mitigates reliance on extensive manual annotations by adapting
the pretrained foundation model, Medical Segment Anything (MedSAM). MedSAM-CA
introduces two key components: the Convolutional Attention-Enhanced Boundary
Refinement Network (CBR-Net) and the Attention-Enhanced Feature Fusion Block
(Atte-FFB). CBR-Net operates in parallel with the MedSAM encoder to recover
boundary information potentially overlooked by long-range attention mechanisms,
leveraging hierarchical convolutional processing. Atte-FFB, embedded in the
MedSAM decoder, fuses multi-level fine-grained features from skip connections
in CBR-Net with global representations upsampled within the decoder to enhance
boundary delineation accuracy. Experiments on publicly available datasets
covering dermoscopy, CT, and MRI imaging modalities validate the effectiveness
of MedSAM-CA. On dermoscopy dataset, MedSAM-CA achieves 94.43% Dice with only
2% of full training data, reaching 97.25% of full-data training performance,
demonstrating strong effectiveness in low-resource clinical settings.