Deep Learning Reforms Image Matching: A Survey and Outlook
Journal:
arXiv
Published Date:
Jun 5, 2025
Abstract
Image matching, which establishes correspondences between two-view images to
recover 3D structure and camera geometry, serves as a cornerstone in computer
vision and underpins a wide range of applications, including visual
localization, 3D reconstruction, and simultaneous localization and mapping
(SLAM). Traditional pipelines composed of ``detector-descriptor, feature
matcher, outlier filter, and geometric estimator'' falter in challenging
scenarios. Recent deep-learning advances have significantly boosted both
robustness and accuracy. This survey adopts a unique perspective by
comprehensively reviewing how deep learning has incrementally transformed the
classical image matching pipeline. Our taxonomy highly aligns with the
traditional pipeline in two key aspects: i) the replacement of individual steps
in the traditional pipeline with learnable alternatives, including learnable
detector-descriptor, outlier filter, and geometric estimator; and ii) the
merging of multiple steps into end-to-end learnable modules, encompassing
middle-end sparse matcher, end-to-end semi-dense/dense matcher, and pose
regressor. We first examine the design principles, advantages, and limitations
of both aspects, and then benchmark representative methods on relative pose
recovery, homography estimation, and visual localization tasks. Finally, we
discuss open challenges and outline promising directions for future research.
By systematically categorizing and evaluating deep learning-driven strategies,
this survey offers a clear overview of the evolving image matching landscape
and highlights key avenues for further innovation.