Improving SAM for Camouflaged Object Detection via Dual Stream Adapters
Journal:
arXiv
Published Date:
Mar 8, 2025
Abstract
Segment anything model (SAM) has shown impressive general-purpose
segmentation performance on natural images, but its performance on camouflaged
object detection (COD) is unsatisfactory. In this paper, we propose SAM-COD
that performs camouflaged object detection for RGB-D inputs. While keeping the
SAM architecture intact, dual stream adapters are expanded on the image encoder
to learn potential complementary information from RGB images and depth images,
and fine-tune the mask decoder and its depth replica to perform dual-stream
mask prediction. In practice, the dual stream adapters are embedded into the
attention block of the image encoder in a parallel manner to facilitate the
refinement and correction of the two types of image embeddings. To mitigate
channel discrepancies arising from dual stream embeddings that do not directly
interact with each other, we augment the association of dual stream embeddings
using bidirectional knowledge distillation including a model distiller and a
modal distiller. In addition, to predict the masks for RGB and depth attention
maps, we hybridize the two types of image embeddings which are jointly learned
with the prompt embeddings to update the initial prompt, and then feed them
into the mask decoders to synchronize the consistency of image embeddings and
prompt embeddings. Experimental results on four COD benchmarks show that our
SAM-COD achieves excellent detection performance gains over SAM and achieves
state-of-the-art results with a given fine-tuning paradigm.