Noise Consistency Regularization for Improved Subject-Driven Image Synthesis
Journal:
arXiv
Published Date:
Jun 6, 2025
Abstract
Fine-tuning Stable Diffusion enables subject-driven image synthesis by
adapting the model to generate images containing specific subjects. However,
existing fine-tuning methods suffer from two key issues: underfitting, where
the model fails to reliably capture subject identity, and overfitting, where it
memorizes the subject image and reduces background diversity. To address these
challenges, we propose two auxiliary consistency losses for diffusion
fine-tuning. First, a prior consistency regularization loss ensures that the
predicted diffusion noise for prior (non-subject) images remains consistent
with that of the pretrained model, improving fidelity. Second, a subject
consistency regularization loss enhances the fine-tuned model's robustness to
multiplicative noise modulated latent code, helping to preserve subject
identity while improving diversity. Our experimental results demonstrate that
incorporating these losses into fine-tuning not only preserves subject identity
but also enhances image diversity, outperforming DreamBooth in terms of CLIP
scores, background variation, and overall visual quality.