ZS-VCOS: Zero-Shot Outperforms Supervised Video Camouflaged Object Segmentation
Journal:
arXiv
Published Date:
Apr 10, 2025
Abstract
Camouflaged object segmentation presents unique challenges compared to
traditional segmentation tasks, primarily due to the high similarity in
patterns and colors between camouflaged objects and their backgrounds.
Effective solutions to this problem have significant implications in critical
areas such as pest control, defect detection, and lesion segmentation in
medical imaging. Prior research has predominantly emphasized supervised or
unsupervised pre-training methods, leaving zero-shot approaches significantly
underdeveloped. Existing zero-shot techniques commonly utilize the Segment
Anything Model (SAM) in automatic mode or rely on vision-language models to
generate cues for segmentation; however, their performances remain
unsatisfactory, likely due to the similarity of the camouflaged object and the
background. Optical flow, commonly utilized for detecting moving objects, has
demonstrated effectiveness even with camouflaged entities. Our method
integrates optical flow, a vision-language model, and SAM 2 into a sequential
pipeline. Evaluated on the MoCA-Mask dataset, our approach achieves outstanding
performance improvements, significantly outperforming existing zero-shot
methods by raising the F-measure ($F_\beta^w$) from 0.296 to 0.628. Remarkably,
our approach also surpasses supervised methods, increasing the F-measure from
0.476 to 0.628. Additionally, evaluation on the MoCA-Filter dataset
demonstrates an increase in the success rate from 0.628 to 0.697 when compared
with FlowSAM, a supervised transfer method. A thorough ablation study further
validates the individual contributions of each component. More details can be
found on https://github.com/weathon/vcos.