Data-Free Universal Attack by Exploiting the Intrinsic Vulnerability of Deep Models
Journal:
arXiv
Published Date:
Mar 28, 2025
Abstract
Deep neural networks (DNNs) are susceptible to Universal Adversarial
Perturbations (UAPs), which are instance agnostic perturbations that can
deceive a target model across a wide range of samples. Unlike instance-specific
adversarial examples, UAPs present a greater challenge as they must generalize
across different samples and models. Generating UAPs typically requires access
to numerous examples, which is a strong assumption in real-world tasks. In this
paper, we propose a novel data-free method called Intrinsic UAP (IntriUAP), by
exploiting the intrinsic vulnerabilities of deep models. We analyze a series of
popular deep models composed of linear and nonlinear layers with a Lipschitz
constant of 1, revealing that the vulnerability of these models is
predominantly influenced by their linear components. Based on this observation,
we leverage the ill-conditioned nature of the linear components by aligning the
UAP with the right singular vectors corresponding to the maximum singular value
of each linear layer. Remarkably, our method achieves highly competitive
performance in attacking popular image classification deep models without using
any image samples. We also evaluate the black-box attack performance of our
method, showing that it matches the state-of-the-art baseline for data-free
methods on models that conform to our theoretical framework. Beyond the
data-free assumption, IntriUAP also operates under a weaker assumption, where
the adversary only can access a few of the victim model's layers. Experiments
demonstrate that the attack success rate decreases by only 4% when the
adversary has access to just 50% of the linear layers in the victim model.