Breaking Latent Prior Bias in Detectors for Generalizable AIGC Image Detection
Journal:
arXiv
Published Date:
Jun 1, 2025
Abstract
Current AIGC detectors often achieve near-perfect accuracy on images produced
by the same generator used for training but struggle to generalize to outputs
from unseen generators. We trace this failure in part to latent prior bias:
detectors learn shortcuts tied to patterns stemming from the initial noise
vector rather than learning robust generative artifacts. To address this, we
propose On-Manifold Adversarial Training (OMAT): by optimizing the initial
latent noise of diffusion models under fixed conditioning, we generate
on-manifold adversarial examples that remain on the generator's output
manifold-unlike pixel-space attacks, which introduce off-manifold perturbations
that the generator itself cannot reproduce and that can obscure the true
discriminative artifacts. To test against state-of-the-art generative models,
we introduce GenImage++, a test-only benchmark of outputs from advanced
generators (Flux.1, SD3) with extended prompts and diverse styles. We apply our
adversarial-training paradigm to ResNet50 and CLIP baselines and evaluate
across existing AIGC forensic benchmarks and recent challenge datasets.
Extensive experiments show that adversarially trained detectors significantly
improve cross-generator performance without any network redesign. Our findings
on latent-prior bias offer valuable insights for future dataset construction
and detector evaluation, guiding the development of more robust and
generalizable AIGC forensic methodologies.