Without Paired Labeled Data: An End-to-End Self-Supervised Paradigm for UAV-View Geo-Localization
Journal:
arXiv
Published Date:
Feb 17, 2025
Abstract
UAV-View Geo-Localization (UVGL) aims to achieve accurate localization of
unmanned aerial vehicles (UAVs) by retrieving the most relevant GPS-tagged
satellite images. However, existing methods heavily rely on pre-paired
UAV-satellite images for supervised learning. Such dependency not only incurs
high annotation costs but also severely limits scalability and practical
deployment in open-world UVGL scenarios. To address these limitations, we
propose an end-to-end self-supervised UVGL method. Our method leverages a
shallow backbone network to extract initial features, employs clustering to
generate pseudo labels, and adopts a dual-path contrastive learning
architecture to learn discriminative intra-view representations. Furthermore,
our method incorporates two core modules, the dynamic hierarchical memory
learning module and the information consistency evolution learning module. The
dynamic hierarchical memory learning module combines short-term and long-term
memory to enhance intra-view feature consistency and discriminability.
Meanwhile, the information consistency evolution learning module leverages a
neighborhood-driven dynamic constraint mechanism to systematically capture
implicit cross-view semantic correlations, thereby improving cross-view feature
alignment. To further stabilize and strengthen the self-supervised training
process, a pseudo-label enhancement strategy is introduced, which refines the
quality of pseudo supervision. Our method ultimately constructs a unified
cross-view feature representation space under self-supervised settings.
Extensive experiments on three public benchmark datasets demonstrate that the
proposed method consistently outperforms existing self-supervised methods and
even surpasses several state-of-the-art supervised methods. Our code is
available at https://github.com/ISChenawei/DMNIL.