PersGuard: Preventing Malicious Personalization via Backdoor Attacks on Pre-trained Text-to-Image Diffusion Models
Journal:
arXiv
Published Date:
Feb 22, 2025
Abstract
Diffusion models (DMs) have revolutionized data generation, particularly in
text-to-image (T2I) synthesis. However, the widespread use of personalized
generative models raises significant concerns regarding privacy violations and
copyright infringement. To address these issues, researchers have proposed
adversarial perturbation-based protection techniques. However, these methods
have notable limitations, including insufficient robustness against data
transformations and the inability to fully eliminate identifiable features of
protected objects in the generated output. In this paper, we introduce
PersGuard, a novel backdoor-based approach that prevents malicious
personalization of specific images. Unlike traditional adversarial perturbation
methods, PersGuard implant backdoor triggers into pre-trained T2I models,
preventing the generation of customized outputs for designated protected images
while allowing normal personalization for unprotected ones. Unfortunately,
existing backdoor methods for T2I diffusion models fail to be applied to
personalization scenarios due to the different backdoor objectives and the
potential backdoor elimination during downstream fine-tuning processes. To
address these, we propose three novel backdoor objectives specifically designed
for personalization scenarios, coupled with backdoor retention loss engineered
to resist downstream fine-tuning. These components are integrated into a
unified optimization framework. Extensive experimental evaluations demonstrate
PersGuard's effectiveness in preserving data privacy, even under challenging
conditions including gray-box settings, multi-object protection, and facial
identity scenarios. Our method significantly outperforms existing techniques,
offering a more robust solution for privacy and copyright protection.