MGD-SAM2: Multi-view Guided Detail-enhanced Segment Anything Model 2 for High-Resolution Class-agnostic Segmentation
Journal:
arXiv
Published Date:
Mar 31, 2025
Abstract
Segment Anything Models (SAMs), as vision foundation models, have
demonstrated remarkable performance across various image analysis tasks.
Despite their strong generalization capabilities, SAMs encounter challenges in
fine-grained detail segmentation for high-resolution class-independent
segmentation (HRCS), due to the limitations in the direct processing of
high-resolution inputs and low-resolution mask predictions, and the reliance on
accurate manual prompts. To address these limitations, we propose MGD-SAM2
which integrates SAM2 with multi-view feature interaction between a global
image and local patches to achieve precise segmentation. MGD-SAM2 incorporates
the pre-trained SAM2 with four novel modules: the Multi-view Perception Adapter
(MPAdapter), the Multi-view Complementary Enhancement Module (MCEM), the
Hierarchical Multi-view Interaction Module (HMIM), and the Detail Refinement
Module (DRM). Specifically, we first introduce MPAdapter to adapt the SAM2
encoder for enhanced extraction of local details and global semantics in HRCS
images. Then, MCEM and HMIM are proposed to further exploit local texture and
global context by aggregating multi-view features within and across
multi-scales. Finally, DRM is designed to generate gradually restored
high-resolution mask predictions, compensating for the loss of fine-grained
details resulting from directly upsampling the low-resolution prediction maps.
Experimental results demonstrate the superior performance and strong
generalization of our model on multiple high-resolution and normal-resolution
datasets. Code will be available at https://github.com/sevenshr/MGD-SAM2.