Guiding Diffusion with Deep Geometric Moments: Balancing Fidelity and Variation
Journal:
arXiv
Published Date:
May 18, 2025
Abstract
Text-to-image generation models have achieved remarkable capabilities in
synthesizing images, but often struggle to provide fine-grained control over
the output. Existing guidance approaches, such as segmentation maps and depth
maps, introduce spatial rigidity that restricts the inherent diversity of
diffusion models. In this work, we introduce Deep Geometric Moments (DGM) as a
novel form of guidance that encapsulates the subject's visual features and
nuances through a learned geometric prior. DGMs focus specifically on the
subject itself compared to DINO or CLIP features, which suffer from
overemphasis on global image features or semantics. Unlike ResNets, which are
sensitive to pixel-wise perturbations, DGMs rely on robust geometric moments.
Our experiments demonstrate that DGM effectively balance control and diversity
in diffusion-based image generation, allowing a flexible control mechanism for
steering the diffusion process.