Semantic discrete decoder based on adaptive pixel clustering for monocular depth estimation.

Journal: Neural networks : the official journal of the International Neural Network Society

Published Date: Sep 1, 2025

Abstract

Monocular depth estimation (MDE) has long been a popular and challenging task. Currently, mainstream methods mainly include regression methods based on geometric constraints and ordinal regression methods based on discretized depth intervals. However, they both overlook the fact that depth values within objects often exhibit some degree of continuity, while depth values between objects exhibit varying degrees of discontinuity. Based on this, we propose a more general approach to monocular depth estimation called APCDepth. This method does not treat MDE as an ordinal regression task but rather as a continuous regression task to ensure the continuity of depth values within objects. To focus on the discontinuity of depth values between objects, we propose an Adaptive Pixel Clustering (APC) module to semantically discretize encoder deep features, and align the discretized feature maps to a larger resolution using our proposed Cross-Semantic Alignment (CSA) module. Additionally, to tackle the quadratic complexity issue introduced by Transformers as decoders in depth estimation, we propose a Deformable Feature Pyramid Network (DeFPN) with sparse attention for multi-scale feature fusion. Furthermore, experimental results on the KITTI and NYU datasets validate the effectiveness of APCDepth and demonstrate outstanding performance.

Authors

Xuanxuan Liu

College of Computer Science and Technology, Qingdao University, Qingdao, 266071 China.
Shuai Tang

Institute of Future Technology, South China University of Technology, Guangdong 511442, China. Electronic address: stanginch@gmail.com.
Mingzhi Ye

Shenzhen Institute for Advanced Study, University of Electronic Science and Technology of China, Shenzhen 518110, China. Electronic address: mingzhiye2001@gmail.com.
Tongwei Lu

School of Computer Science and Engineering, Wuhan Institute of Technology, Wuhan 430205, China. Electronic address: lutongwei@wit.edu.cn.
Lixin Duan

Keywords

Algorithms Cluster Analysis Depth Perception Humans Neural Networks, Computer Semantics Vision, Monocular

External Resources

View on PubMed Access via DOI PubMed (40414148)

Semantic discrete decoder based on adaptive pixel clustering for monocular depth estimation.

Abstract

Authors

Keywords

External Resources

Popular Topics

Recent Journals

Semantic discrete decoder based on adaptive pixel clustering for monocular depth estimation.

Abstract

Authors

Keywords

External Resources

Stay Ahead of Medical AI

Popular Topics

Recent Journals