Boosting Domain Generalized and Adaptive Detection with Diffusion Models: Fitness, Generalization, and Transferability
Journal:
arXiv
Published Date:
Jun 26, 2025
Abstract
Detectors often suffer from performance drop due to domain gap between
training and testing data. Recent methods explore diffusion models applied to
domain generalization (DG) and adaptation (DA) tasks, but still struggle with
large inference costs and have not yet fully leveraged the capabilities of
diffusion models. We propose to tackle these problems by extracting
intermediate features from a single-step diffusion process, improving feature
collection and fusion to reduce inference time by 75% while enhancing
performance on source domains (i.e., Fitness). Then, we construct an
object-centered auxiliary branch by applying box-masked images with class
prompts to extract robust and domain-invariant features that focus on object.
We also apply consistency loss to align the auxiliary and ordinary branch,
balancing fitness and generalization while preventing overfitting and improving
performance on target domains (i.e., Generalization). Furthermore, within a
unified framework, standard detectors are guided by diffusion detectors through
feature-level and object-level alignment on source domains (for DG) and
unlabeled target domains (for DA), thereby improving cross-domain detection
performance (i.e., Transferability). Our method achieves competitive results on
3 DA benchmarks and 5 DG benchmarks. Additionally, experiments on COCO
generalization benchmark demonstrate that our method maintains significant
advantages and show remarkable efficiency in large domain shifts and low-data
scenarios. Our work shows the superiority of applying diffusion models to
domain generalized and adaptive detection tasks and offers valuable insights
for visual perception tasks across diverse domains. The code is available at
\href{https://github.com/heboyong/Fitness-Generalization-Transferability}.