Segment Anything, Even Occluded
Journal:
arXiv
Published Date:
Mar 8, 2025
Abstract
Amodal instance segmentation, which aims to detect and segment both visible
and invisible parts of objects in images, plays a crucial role in various
applications including autonomous driving, robotic manipulation, and scene
understanding. While existing methods require training both front-end detectors
and mask decoders jointly, this approach lacks flexibility and fails to
leverage the strengths of pre-existing modal detectors. To address this
limitation, we propose SAMEO, a novel framework that adapts the Segment
Anything Model (SAM) as a versatile mask decoder capable of interfacing with
various front-end detectors to enable mask prediction even for partially
occluded objects. Acknowledging the constraints of limited amodal segmentation
datasets, we introduce Amodal-LVIS, a large-scale synthetic dataset comprising
300K images derived from the modal LVIS and LVVIS datasets. This dataset
significantly expands the training data available for amodal segmentation
research. Our experimental results demonstrate that our approach, when trained
on the newly extended dataset, including Amodal-LVIS, achieves remarkable
zero-shot performance on both COCOA-cls and D2SA benchmarks, highlighting its
potential for generalization to unseen scenarios.