CROPS: Model-Agnostic Training-Free Framework for Safe Image Synthesis with Latent Diffusion Models
Journal:
arXiv
Published Date:
Jan 9, 2025
Abstract
With advances in diffusion models, image generation has shown significant
performance improvements. This raises concerns about the potential abuse of
image generation, such as the creation of explicit or violent images, commonly
referred to as Not Safe For Work (NSFW) content. To address this, the Stable
Diffusion model includes several safety checkers to censor initial text prompts
and final output images generated from the model. However, recent research has
shown that these safety checkers have vulnerabilities against adversarial
attacks, allowing them to generate NSFW images. In this paper, we find that
these adversarial attacks are not robust to small changes in text prompts or
input latents. Based on this, we propose CROPS (Circular or RandOm Prompts for
Safety), a model-agnostic framework that easily defends against adversarial
attacks generating NSFW images without requiring additional training. Moreover,
we develop an approach that utilizes one-step diffusion models for efficient
NSFW detection (CROPS-1), further reducing computational resources. We
demonstrate the superiority of our method in terms of performance and
applicability.