CoMatch: Dynamic Covisibility-Aware Transformer for Bilateral Subpixel-Level Semi-Dense Image Matching
Journal:
arXiv
Published Date:
Mar 31, 2025
Abstract
This prospective study proposes CoMatch, a novel semi-dense image matcher
with dynamic covisibility awareness and bilateral subpixel accuracy. Firstly,
observing that modeling context interaction over the entire coarse feature map
elicits highly redundant computation due to the neighboring representation
similarity of tokens, a covisibility-guided token condenser is introduced to
adaptively aggregate tokens in light of their covisibility scores that are
dynamically estimated, thereby ensuring computational efficiency while
improving the representational capacity of aggregated tokens simultaneously.
Secondly, considering that feature interaction with massive non-covisible areas
is distracting, which may degrade feature distinctiveness, a
covisibility-assisted attention mechanism is deployed to selectively suppress
irrelevant message broadcast from non-covisible reduced tokens, resulting in
robust and compact attention to relevant rather than all ones. Thirdly, we find
that at the fine-level stage, current methods adjust only the target view's
keypoints to subpixel level, while those in the source view remain restricted
at the coarse level and thus not informative enough, detrimental to keypoint
location-sensitive usages. A simple yet potent fine correlation module is
developed to refine the matching candidates in both source and target views to
subpixel level, attaining attractive performance improvement. Thorough
experimentation across an array of public benchmarks affirms CoMatch's
promising accuracy, efficiency, and generalizability.