CLIP-Guided Backdoor Defense through Entropy-Based Poisoned Dataset Separation
Journal:
arXiv
Published Date:
Jul 7, 2025
Abstract
Deep Neural Networks (DNNs) are susceptible to backdoor attacks, where
adversaries poison training data to implant backdoor into the victim model.
Current backdoor defenses on poisoned data often suffer from high computational
costs or low effectiveness against advanced attacks like clean-label and
clean-image backdoors. To address them, we introduce CLIP-Guided backdoor
Defense (CGD), an efficient and effective method that mitigates various
backdoor attacks. CGD utilizes a publicly accessible CLIP model to identify
inputs that are likely to be clean or poisoned. It then retrains the model with
these inputs, using CLIP's logits as a guidance to effectively neutralize the
backdoor. Experiments on 4 datasets and 11 attack types demonstrate that CGD
reduces attack success rates (ASRs) to below 1% while maintaining clean
accuracy (CA) with a maximum drop of only 0.3%, outperforming existing
defenses. Additionally, we show that clean-data-based defenses can be adapted
to poisoned data using CGD. Also, CGD exhibits strong robustness, maintaining
low ASRs even when employing a weaker CLIP model or when CLIP itself is
compromised by a backdoor. These findings underscore CGD's exceptional
efficiency, effectiveness, and applicability for real-world backdoor defense
scenarios. Code: https://github.com/binyxu/CGD.