Self-Disentanglement and Re-Composition for Cross-Domain Few-Shot Segmentation
Journal:
arXiv
Published Date:
Jun 3, 2025
Abstract
Cross-Domain Few-Shot Segmentation (CD-FSS) aims to transfer knowledge from a
source-domain dataset to unseen target-domain datasets with limited
annotations. Current methods typically compare the distance between training
and testing samples for mask prediction. However, we find an entanglement
problem exists in this widely adopted method, which tends to bind sourcedomain
patterns together and make each of them hard to transfer. In this paper, we aim
to address this problem for the CD-FSS task. We first find a natural
decomposition of the ViT structure, based on which we delve into the
entanglement problem for an interpretation. We find the decomposed ViT
components are crossly compared between images in distance calculation, where
the rational comparisons are entangled with those meaningless ones by their
equal importance, leading to the entanglement problem. Based on this
interpretation, we further propose to address the entanglement problem by
learning to weigh for all comparisons of ViT components, which learn
disentangled features and re-compose them for the CD-FSS task, benefiting both
the generalization and finetuning. Experiments show that our model outperforms
the state-of-the-art CD-FSS method by 1.92% and 1.88% in average accuracy under
1-shot and 5-shot settings, respectively.