A Survey of Representation Learning, Optimization Strategies, and Applications for Omnidirectional Vision
Journal:
arXiv
Published Date:
Feb 11, 2025
Abstract
Omnidirectional image (ODI) data is captured with a field-of-view of 360x180,
which is much wider than the pinhole cameras and captures richer surrounding
environment details than the conventional perspective images. In recent years,
the availability of customer-level 360 cameras has made omnidirectional vision
more popular, and the advance of deep learning (DL) has significantly sparked
its research and applications. This paper presents a systematic and
comprehensive review and analysis of the recent progress of DL for
omnidirectional vision. It delineates the distinct challenges and complexities
encountered in applying DL to omnidirectional images as opposed to traditional
perspective imagery. Our work covers four main contents: (i) A thorough
introduction to the principles of omnidirectional imaging and commonly explored
projections of ODI; (ii) A methodical review of varied representation learning
approaches tailored for ODI; (iii) An in-depth investigation of optimization
strategies specific to omnidirectional vision; (iv) A structural and
hierarchical taxonomy of the DL methods for the representative omnidirectional
vision tasks, from visual enhancement (e.g., image generation and
super-resolution) to 3D geometry and motion estimation (e.g., depth and optical
flow estimation), alongside the discussions on emergent research directions;
(v) An overview of cutting-edge applications (e.g., autonomous driving and
virtual reality), coupled with a critical discussion on prevailing challenges
and open questions, to trigger more research in the community.