Free-Lunch Color-Texture Disentanglement for Stylized Image Generation
Journal:
arXiv
Published Date:
Mar 18, 2025
Abstract
Recent advances in Text-to-Image (T2I) diffusion models have transformed
image generation, enabling significant progress in stylized generation using
only a few style reference images. However, current diffusion-based methods
struggle with fine-grained style customization due to challenges in controlling
multiple style attributes, such as color and texture. This paper introduces the
first tuning-free approach to achieve free-lunch color-texture disentanglement
in stylized T2I generation, addressing the need for independently controlled
style elements for the Disentangled Stylized Image Generation (DisIG) problem.
Our approach leverages the Image-Prompt Additivity property in the CLIP image
embedding space to develop techniques for separating and extracting
Color-Texture Embeddings (CTE) from individual color and texture reference
images. To ensure that the color palette of the generated image aligns closely
with the color reference, we apply a whitening and coloring transformation to
enhance color consistency. Additionally, to prevent texture loss due to the
signal-leak bias inherent in diffusion training, we introduce a noise term that
preserves textural fidelity during the Regularized Whitening and Coloring
Transformation (RegWCT). Through these methods, our Style Attributes
Disentanglement approach (SADis) delivers a more precise and customizable
solution for stylized image generation. Experiments on images from the WikiArt
and StyleDrop datasets demonstrate that, both qualitatively and quantitatively,
SADis surpasses state-of-the-art stylization methods in the DisIG task.Code
will be released at https://deepffff.github.io/sadis.github.io/.