DECNet: Dense embedding contrast for unsupervised semantic segmentation.

Journal: Neural networks : the official journal of the International Neural Network Society
Published Date:

Abstract

Unsupervised semantic segmentation is important for understanding that each pixel belongs to known categories without annotation. Recent studies have demonstrated promising outcomes by employing a vision transformer backbone pre-trained on an image-level dataset in a self-supervised manner. However, those methods always depend on complex architectures or meticulously designed inputs. Naturally, we are attempting to explore the investment with a straightforward approach. To prevent over-complication, we introduce a simple Dense Embedding Contrast network (DECNet) for unsupervised semantic segmentation in this paper. Specifically, we propose a Nearest Neighbor Similarity strategy (NNS) to establish well-defined positive and negative pairs for dense contrastive learning. Meanwhile, we optimize a contrastive objective named Ortho-InfoNCE to alleviate the false negative problem inherent in contrastive learning for further enhancing dense representations. Finally, extensive experiments conducted on COCO-Stuff and Cityscapes datasets demonstrate that our approach outperforms state-of-the-art methods.

Authors

  • Xiaoqin Zhang
    Department of Radiology, The First Affiliated Hospital of Wenzhou Medical University, PR China.
  • Baiyu Chen
    Department of Radiology, Mayo Clinic, Rochester, MN, USA.
  • Xiaolong Zhou
    College of Computer Science and Technology, Zhejiang University of Technology, Hangzhou 310023, China. zxl@zjut.edu.cn.
  • Sixian Chan
    School of Computer Science and Technology, Zhejiang University of Technology, Hangzhou, Zhejiang, 310014, China; Hefei Institutes of Physical Science, Chinese Academy of Sciences, Hefei, Anhui, 230031, China. Electronic address: sxchan@zjut.edu.cn.