Diffusion-based Synthetic Data Generation for Visible-Infrared Person Re-Identification
Journal:
arXiv
Published Date:
Mar 16, 2025
Abstract
The performance of models is intricately linked to the abundance of training
data. In Visible-Infrared person Re-IDentification (VI-ReID) tasks, collecting
and annotating large-scale images of each individual under various cameras and
modalities is tedious, time-expensive, costly and must comply with data
protection laws, posing a severe challenge in meeting dataset requirements.
Current research investigates the generation of synthetic data as an efficient
and privacy-ensuring alternative to collecting real data in the field. However,
a specific data synthesis technique tailored for VI-ReID models has yet to be
explored. In this paper, we present a novel data generation framework, dubbed
Diffusion-based VI-ReID data Expansion (DiVE), that automatically obtain
massive RGB-IR paired images with identity preserving by decoupling identity
and modality to improve the performance of VI-ReID models. Specifically,
identity representation is acquired from a set of samples sharing the same ID,
whereas the modality of images is learned by fine-tuning the Stable Diffusion
(SD) on modality-specific data. DiVE extend the text-driven image synthesis to
identity-preserving RGB-IR multimodal image synthesis. This approach
significantly reduces data collection and annotation costs by directly
incorporating synthetic data into ReID model training. Experiments have
demonstrated that VI-ReID models trained on synthetic data produced by DiVE
consistently exhibit notable enhancements. In particular, the state-of-the-art
method, CAJ, trained with synthetic images, achieves an improvement of about
$9\%$ in mAP over the baseline on the LLCM dataset. Code:
https://github.com/BorgDiven/DiVE