Navigating with Annealing Guidance Scale in Diffusion Space
Journal:
arXiv
Published Date:
Jun 30, 2025
Abstract
Denoising diffusion models excel at generating high-quality images
conditioned on text prompts, yet their effectiveness heavily relies on careful
guidance during the sampling process. Classifier-Free Guidance (CFG) provides a
widely used mechanism for steering generation by setting the guidance scale,
which balances image quality and prompt alignment. However, the choice of the
guidance scale has a critical impact on the convergence toward a visually
appealing and prompt-adherent image. In this work, we propose an annealing
guidance scheduler which dynamically adjusts the guidance scale over time based
on the conditional noisy signal. By learning a scheduling policy, our method
addresses the temperamental behavior of CFG. Empirical results demonstrate that
our guidance scheduler significantly enhances image quality and alignment with
the text prompt, advancing the performance of text-to-image generation.
Notably, our novel scheduler requires no additional activations or memory
consumption, and can seamlessly replace the common classifier-free guidance,
offering an improved trade-off between prompt alignment and quality.