VOCAL: Visual Odometry via ContrAstive Learning

Journal: arXiv

Published Date: Jun 30, 2025

Abstract

Breakthroughs in visual odometry (VO) have fundamentally reshaped the landscape of robotics, enabling ultra-precise camera state estimation that is crucial for modern autonomous systems. Despite these advances, many learning-based VO techniques rely on rigid geometric assumptions, which often fall short in interpretability and lack a solid theoretical basis within fully data-driven frameworks. To overcome these limitations, we introduce VOCAL (Visual Odometry via ContrAstive Learning), a novel framework that reimagines VO as a label ranking challenge. By integrating Bayesian inference with a representation learning framework, VOCAL organizes visual features to mirror camera states. The ranking mechanism compels similar camera states to converge into consistent and spatially coherent representations within the latent space. This strategic alignment not only bolsters the interpretability of the learned features but also ensures compatibility with multimodal data sources. Extensive evaluations on the KITTI dataset highlight VOCAL's enhanced interpretability and flexibility, pushing VO toward more general and explainable spatial intelligence.

Authors

Chi-Yao Huang
Zeel Bhatt
Yezhou Yang

External Resources

View on arXiv arXiv (http://arxiv.org/abs/2507.00243v1)

VOCAL: Visual Odometry via ContrAstive Learning

Abstract

Authors

Categories

External Resources

Popular Topics

Recent Journals

VOCAL: Visual Odometry via ContrAstive Learning

Abstract

Authors

Categories

External Resources

Stay Ahead of Medical AI

Popular Topics

Recent Journals