IBD: An Interpretable Backdoor-Detection Method via Multivariate Interactions.

Journal: Sensors (Basel, Switzerland)

PMID: 36433292

Abstract

Recent work has shown that deep neural networks are vulnerable to backdoor attacks. In comparison with the success of backdoor-attack methods, existing backdoor-defense methods face a lack of theoretical foundations and interpretable solutions. Most defense methods are based on experience with the characteristics of previous attacks, but fail to defend against new attacks. In this paper, we propose IBD, an interpretable backdoor-detection method via multivariate interactions. Using information theory techniques, IBD reveals how the backdoor works from the perspective of multivariate interactions of features. Based on the interpretable theorem, IBD enables defenders to detect backdoor models and poisoned examples without introducing additional information about the specific attack method. Experiments on widely used datasets and models show that IBD achieves a 78% increase in average in detection accuracy and an order-of-magnitude reduction in time cost compared with existing backdoor-detection methods.

Authors

Yixiao Xu

Institute of Computer Application, China Academy of Engineering Physics, Mianyang 621900, China.
Xiaolei Liu

Department of Neurology, the First Affiliated Hospital of Kunming Medical University, Wuhua District, Kunming, Yunnan Province, China.
Kangyi Ding

Institute of Computer Application, China Academy of Engineering Physics, Mianyang 621900, China.
Bangzhou Xin

Institute of Computer Application, China Academy of Engineering Physics, Mianyang 621900, China.

Keywords

Humans Inflammatory Bowel Diseases Information Theory Neural Networks, Computer

External Resources

View on PubMed Access via DOI PubMed (36433292)

IBD: An Interpretable Backdoor-Detection Method via Multivariate Interactions.

Abstract

Authors

Keywords

External Resources

Popular Topics

Recent Journals