Make the Most of Everything: Further Considerations on Disrupting Diffusion-based Customization
Journal:
arXiv
Published Date:
Mar 18, 2025
Abstract
The fine-tuning technique for text-to-image diffusion models facilitates
image customization but risks privacy breaches and opinion manipulation.
Current research focuses on prompt- or image-level adversarial attacks for
anti-customization, yet it overlooks the correlation between these two levels
and the relationship between internal modules and inputs. This hinders
anti-customization performance in practical threat scenarios. We propose Dual
Anti-Diffusion (DADiff), a two-stage adversarial attack targeting diffusion
customization, which, for the first time, integrates the adversarial
prompt-level attack into the generation process of image-level adversarial
examples. In stage 1, we generate prompt-level adversarial vectors to guide the
subsequent image-level attack. In stage 2, besides conducting the end-to-end
attack on the UNet model, we disrupt its self- and cross-attention modules,
aiming to break the correlations between image pixels and align the
cross-attention results computed using instance prompts and adversarial prompt
vectors within the images. Furthermore, we introduce a local random timestep
gradient ensemble strategy, which updates adversarial perturbations by
integrating random gradients from multiple segmented timesets. Experimental
results on various mainstream facial datasets demonstrate 10%-30% improvements
in cross-prompt, keyword mismatch, cross-model, and cross-mechanism
anti-customization with DADiff compared to existing methods.