Real-time and accurate stereo matching via tri-fusion volume for stereo vision.
Journal:
Neural networks : the official journal of the International Neural Network Society
Published Date:
Sep 1, 2025
Abstract
In the field of real-time stereo matching, a concise and informative cost volume is crucial for achieving high efficiency and accuracy. To this end, in this paper, we propose the Tri-Fusion Volume (TFV) to effectively fuse both texture details and similarity information by utilizing three distinct volumes: texture volume, correlation volume, and mutual volume. To address the challenge of preserving texture information (especially in ill-posed regions) during the transmission through deep networks, we propose texture volume by encoding the initial stereo images with low-level features, allowing for the recovery of essential texture details. And in contrast to previously widely adopted strategies, we introduce the deformable attention for the first time when building the correlation volume to adaptively search the matching pixel pairs, and propose the mutual volume to bridge their probability distribution similarity based on mutual information. Our TFV serves as a lightweight plug-in module that significantly enhances performance when integrated with existing real-time methods. Building upon the TFV framework, we further propose TCMNet, a real-time and accurate stereo matching model. The effectiveness of our TFV and TCMNet is systematically tested. The results demonstrate that the performance of previous models can be markedly improved when incorporated with our TFV, and our TCMNet shows leading performance on Scene Flow, KITTI-2012 and KITTI-2015 benchmarks.