Layer Frozen Multi-Net & Latent Space Feature-Concealed Backdoor Samples Detection.

Journal: Neural networks : the official journal of the International Neural Network Society
Published Date:

Abstract

Identifying feature-concealed backdoor samples that entangle with benign semantics of target-class or possess dynamic triggers challenges backdoor attack detection. Existing methods focus on sample distribution differences in latent space of victim model. However, backdoor samples containing target-class discriminative semantics tend to produce distribution overlapped with those of benign samples, prone to missed detection. Moreover, these methods assume distribution of backdoor samples is continuous and clustered, which have difficulty in precisely fitting fractured distribution arising from dynamic-trigger backdoor samples, resulting in a high false detection of benign samples located between distribution sub-clusters. A Layer Frozen Multi-Net and Multi-Latent Space backdoor samples detection method LFMN-LS is proposed. Trigger-Net and Benign-Net are constructed by knowledge refinement towards victim model, which capture distributions of backdoor and benign samples separately, lessening the adverse effects of inter-class distributional overlap. Furthermore, Relative Cosine Distance is designed to jointly measure the distribution difference between backdoor and benign samples across multiple latent spaces, mitigating distributional fractures in single latent space. Experimental results demonstrate LFMN-LS outperforms state-of-the-art methods. LFMN-LS innovatively integrates model hidden layer frozen into knowledge refinement, adequately preserving high-order features of samples.

Authors

  • Jiawei Li
    School of Chemistry & Chemical Engineering, College of Guangling, Yangzhou University Yangzhou 225002 PR China zhuxiashi@sina.com.
  • Senlin Luo
  • Limin Pan
  • Chenlong Zhang
    Information System and Security & Countermeasures Experimental Center, Beijing Institute of Technology, Beijing, 100081, China.
  • Zhao Zhang
  • Chuan Lu
    Department of Computer Science, Aberystwyth University, Aberystwyth, United Kingdom.