DECNet: Dense embedding contrast for unsupervised semantic segmentation.
Journal:
Neural networks : the official journal of the International Neural Network Society
Published Date:
Jul 20, 2024
Abstract
Unsupervised semantic segmentation is important for understanding that each pixel belongs to known categories without annotation. Recent studies have demonstrated promising outcomes by employing a vision transformer backbone pre-trained on an image-level dataset in a self-supervised manner. However, those methods always depend on complex architectures or meticulously designed inputs. Naturally, we are attempting to explore the investment with a straightforward approach. To prevent over-complication, we introduce a simple Dense Embedding Contrast network (DECNet) for unsupervised semantic segmentation in this paper. Specifically, we propose a Nearest Neighbor Similarity strategy (NNS) to establish well-defined positive and negative pairs for dense contrastive learning. Meanwhile, we optimize a contrastive objective named Ortho-InfoNCE to alleviate the false negative problem inherent in contrastive learning for further enhancing dense representations. Finally, extensive experiments conducted on COCO-Stuff and Cityscapes datasets demonstrate that our approach outperforms state-of-the-art methods.