Improving transferability of adversarial examples via statistical attribution-based attacks.
Journal:
Neural networks : the official journal of the International Neural Network Society
PMID:
40086136
Abstract
Adversarial attacks are significant in uncovering vulnerabilities and assessing the robustness of deep neural networks (DNNs), offering profound insights into their internal mechanisms. Feature-level attacks, a potent approach, craft adversarial examples by extensively corrupting the intermediate-layer features of the source model during each iteration. However, it often has imprecise metrics to assess the significance of features and may impose constraints on the transferability of adversarial examples. To address these issues, this paper introduces the Statistical Attribution-based Attack (SAA) method, which emphasizes finding feature importance representations and refining optimization objectives, thereby achieving stronger attack performance. To calculate the Comprehensive Gradient for more accurate feature representation, we introduce the Region-wise Feature Disturbance and Gradient Information Aggregation, which can effectively disrupt the model's attention focus areas. Subsequently, a statistical attribution-based approach is employed, leveraging the average feature information across layers to provide a more advantageous optimization objective. Experiments have validated the superiority of this method. Specifically, SAA improves the attack success rate by 9.3% compared with the second-best method. When combined with input transformation methods, it achieves an average success rate of 79.2% against eight leading defense models.