SD-ReID: View-aware Stable Diffusion for Aerial-Ground Person Re-Identification
Journal:
arXiv
Published Date:
Apr 13, 2025
Abstract
Aerial-Ground Person Re-IDentification (AG-ReID) aims to retrieve specific
persons across cameras with different viewpoints. Previous works focus on
designing discriminative ReID models to maintain identity consistency despite
drastic changes in camera viewpoints. The core idea behind these methods is
quite natural, but designing a view-robust network is a very challenging task.
Moreover, they overlook the contribution of view-specific features in enhancing
the model's capability to represent persons. To address these issues, we
propose a novel two-stage feature learning framework named SD-ReID for AG-ReID,
which takes advantage of the powerful understanding capacity of generative
models, e.g., Stable Diffusion (SD), to generate view-specific features between
different viewpoints. In the first stage, we train a simple ViT-based model to
extract coarse-grained representations and controllable conditions. Then, in
the second stage, we fine-tune the SD model to learn complementary
representations guided by the controllable conditions. Furthermore, we propose
the View-Refine Decoder (VRD) to obtain additional controllable conditions to
generate missing cross-view features. Finally, we use the coarse-grained
representations and all-view features generated by SD to retrieve target
persons. Extensive experiments on the AG-ReID benchmarks demonstrate the
effectiveness of our proposed SD-ReID. The source code will be available upon
acceptance.