Dirty and Clean-Label attack detection using GAN discriminators
Journal:
arXiv
Published Date:
Jun 2, 2025
Abstract
Gathering enough images to train a deep computer vision model is a constant
challenge. Unfortunately, collecting images from unknown sources can leave your
model s behavior at risk of being manipulated by a dirty-label or clean-label
attack unless the images are properly inspected. Manually inspecting each
image-label pair is impractical and common poison-detection methods that
involve re-training your model can be time consuming. This research uses GAN
discriminators to protect a single class against mislabeled and different
levels of modified images. The effect of said perturbation on a basic
convolutional neural network classifier is also included for reference. The
results suggest that after training on a single class, GAN discriminator s
confidence scores can provide a threshold to identify mislabeled images and
identify 100% of the tested poison starting at a perturbation epsilon magnitude
of 0.20, after decision threshold calibration using in-class samples.
Developers can use this report as a basis to train their own discriminators to
protect high valued classes in their CV models.