TRAIL: Transferable Robust Adversarial Images via Latent diffusion
Journal:
arXiv
Published Date:
May 22, 2025
Abstract
Adversarial attacks exploiting unrestricted natural perturbations present
severe security risks to deep learning systems, yet their transferability
across models remains limited due to distribution mismatches between generated
adversarial features and real-world data. While recent works utilize
pre-trained diffusion models as adversarial priors, they still encounter
challenges due to the distribution shift between the distribution of ideal
adversarial samples and the natural image distribution learned by the diffusion
model. To address the challenge, we propose Transferable Robust Adversarial
Images via Latent Diffusion (TRAIL), a test-time adaptation framework that
enables the model to generate images from a distribution of images with
adversarial features and closely resembles the target images. To mitigate the
distribution shift, during attacks, TRAIL updates the diffusion U-Net's weights
by combining adversarial objectives (to mislead victim models) and perceptual
constraints (to preserve image realism). The adapted model then generates
adversarial samples through iterative noise injection and denoising guided by
these objectives. Experiments demonstrate that TRAIL significantly outperforms
state-of-the-art methods in cross-model attack transferability, validating that
distribution-aligned adversarial feature synthesis is critical for practical
black-box attacks.