Dirty and Clean-Label attack detection using GAN discriminators

Journal: arXiv

Published Date: Jun 2, 2025

Abstract

Gathering enough images to train a deep computer vision model is a constant challenge. Unfortunately, collecting images from unknown sources can leave your model s behavior at risk of being manipulated by a dirty-label or clean-label attack unless the images are properly inspected. Manually inspecting each image-label pair is impractical and common poison-detection methods that involve re-training your model can be time consuming. This research uses GAN discriminators to protect a single class against mislabeled and different levels of modified images. The effect of said perturbation on a basic convolutional neural network classifier is also included for reference. The results suggest that after training on a single class, GAN discriminator s confidence scores can provide a threshold to identify mislabeled images and identify 100% of the tested poison starting at a perturbation epsilon magnitude of 0.20, after decision threshold calibration using in-class samples. Developers can use this report as a basis to train their own discriminators to protect high valued classes in their CV models.

Authors

John W. Smutny

External Resources

View on arXiv arXiv (http://arxiv.org/abs/2506.01224v2)

Dirty and Clean-Label attack detection using GAN discriminators

Abstract

Authors

Categories

External Resources

Popular Topics

Recent Journals

Dirty and Clean-Label attack detection using GAN discriminators

Abstract

Authors

Categories

External Resources

Stay Ahead of Medical AI

Popular Topics

Recent Journals