Strongly concealed adversarial attack against text classification models with limited queries.

Journal: Neural networks : the official journal of the International Neural Network Society
Published Date:

Abstract

In black-box scenarios, adversarial attacks against text classification models face challenges in ensuring highly available adversarial samples, especially a high number of invalid queries under long texts. The existing methods select distractors by comparing the confidence vector differences obtained before and after deleting words, and the query increases linearly with the length of the text, making it difficult to apply to attack scenarios with limited queries. Generating adversarial samples based on a thesaurus can lead to semantic inconsistencies and even grammatical errors, making it easy for the target model to recognize adversarial samples and resulting in a low success rate of attacks. A parallel and highly stealthy Adversarial Attack against Text Classification Model (AdATCM) is proposed, which reinforces dual-task of attack and generation. This method does not require querying the target model during the selection of distractors. Instead, it directly uses contextual information to calculate the importance of words and selects distractors in one go, strengthening the concealment of attacks. Integrating KL divergence loss, cross entropy loss, and adversarial loss to construct an objective function for training an adversarial sample attack model, generating adversarial samples that can fit the original sample distribution and strengthen the success rate of attacks. The experimental results show that this method has a high success rate and strong concealment, effectively reducing the number of attack queries under long text conditions.

Authors

  • Yao Cheng
    Southwest Jiaotong University, State Key Laboratory of Traction Power, Chengdu, 610000, China.
  • Senlin Luo
  • Yunwei Wan
    School of Information and Electronics, Beijing Institute of Technology, Beijing 100081, PR China. Electronic address: francishijack@gmail.com.
  • Limin Pan
  • Xinshuai Li
    School of Information and Electronics, Beijing Institute of Technology, Beijing 100081, PR China. Electronic address: bfs_lxs@163.com.