Semantic discrete decoder based on adaptive pixel clustering for monocular depth estimation.

Journal: Neural networks : the official journal of the International Neural Network Society
Published Date:

Abstract

Monocular depth estimation (MDE) has long been a popular and challenging task. Currently, mainstream methods mainly include regression methods based on geometric constraints and ordinal regression methods based on discretized depth intervals. However, they both overlook the fact that depth values within objects often exhibit some degree of continuity, while depth values between objects exhibit varying degrees of discontinuity. Based on this, we propose a more general approach to monocular depth estimation called APCDepth. This method does not treat MDE as an ordinal regression task but rather as a continuous regression task to ensure the continuity of depth values within objects. To focus on the discontinuity of depth values between objects, we propose an Adaptive Pixel Clustering (APC) module to semantically discretize encoder deep features, and align the discretized feature maps to a larger resolution using our proposed Cross-Semantic Alignment (CSA) module. Additionally, to tackle the quadratic complexity issue introduced by Transformers as decoders in depth estimation, we propose a Deformable Feature Pyramid Network (DeFPN) with sparse attention for multi-scale feature fusion. Furthermore, experimental results on the KITTI and NYU datasets validate the effectiveness of APCDepth and demonstrate outstanding performance.

Authors

  • Xuanxuan Liu
    College of Computer Science and Technology, Qingdao University, Qingdao, 266071 China.
  • Shuai Tang
    Institute of Future Technology, South China University of Technology, Guangdong 511442, China. Electronic address: stanginch@gmail.com.
  • Mingzhi Ye
    Shenzhen Institute for Advanced Study, University of Electronic Science and Technology of China, Shenzhen 518110, China. Electronic address: mingzhiye2001@gmail.com.
  • Tongwei Lu
    School of Computer Science and Engineering, Wuhan Institute of Technology, Wuhan 430205, China. Electronic address: lutongwei@wit.edu.cn.
  • Lixin Duan