Unlearnable Examples Detection via Iterative Filtering
Journal:
arXiv
Published Date:
Aug 15, 2024
Abstract
Deep neural networks are proven to be vulnerable to data poisoning attacks.
Recently, a specific type of data poisoning attack known as availability
attacks has led to the failure of data utilization for model learning by adding
imperceptible perturbations to images. Consequently, it is quite beneficial and
challenging to detect poisoned samples, also known as Unlearnable Examples
(UEs), from a mixed dataset. In response, we propose an Iterative Filtering
approach for UEs identification. This method leverages the distinction between
the inherent semantic mapping rules and shortcuts, without the need for any
additional information. We verify that when training a classifier on a mixed
dataset containing both UEs and clean data, the model tends to quickly adapt to
the UEs compared to the clean data. Due to the accuracy gaps between training
with clean/poisoned samples, we employ a model to misclassify clean samples
while correctly identifying the poisoned ones. The incorporation of additional
classes and iterative refinement enhances the model's ability to differentiate
between clean and poisoned samples. Extensive experiments demonstrate the
superiority of our method over state-of-the-art detection approaches across
various attacks, datasets, and poison ratios, significantly reducing the Half
Total Error Rate (HTER) compared to existing methods.