Conditional Diffusion Models with Classifier-Free Gibbs-like Guidance
Journal:
arXiv
Published Date:
May 27, 2025
Abstract
Classifier-Free Guidance (CFG) is a widely used technique for improving
conditional diffusion models by linearly combining the outputs of conditional
and unconditional denoisers. While CFG enhances visual quality and improves
alignment with prompts, it often reduces sample diversity, leading to a
challenging trade-off between quality and diversity. To address this issue, we
make two key contributions. First, CFG generally does not correspond to a
well-defined denoising diffusion model (DDM). In particular, contrary to common
intuition, CFG does not yield samples from the target distribution associated
with the limiting CFG score as the noise level approaches zero -- where the
data distribution is tilted by a power $w \gt 1$ of the conditional
distribution. We identify the missing component: a R\'enyi divergence term that
acts as a repulsive force and is required to correct CFG and render it
consistent with a proper DDM. Our analysis shows that this correction term
vanishes in the low-noise limit. Second, motivated by this insight, we propose
a Gibbs-like sampling procedure to draw samples from the desired tilted
distribution. This method starts with an initial sample from the conditional
diffusion model without CFG and iteratively refines it, preserving diversity
while progressively enhancing sample quality. We evaluate our approach on both
image and text-to-audio generation tasks, demonstrating substantial
improvements over CFG across all considered metrics. The code is available at
https://github.com/yazidjanati/cfgig