Training Free Stylized Abstraction
Journal:
arXiv
Published Date:
May 28, 2025
Abstract
Stylized abstraction synthesizes visually exaggerated yet semantically
faithful representations of subjects, balancing recognizability with perceptual
distortion. Unlike image-to-image translation, which prioritizes structural
fidelity, stylized abstraction demands selective retention of identity cues
while embracing stylistic divergence, especially challenging for
out-of-distribution individuals. We propose a training-free framework that
generates stylized abstractions from a single image using inference-time
scaling in vision-language models (VLLMs) to extract identity-relevant
features, and a novel cross-domain rectified flow inversion strategy that
reconstructs structure based on style-dependent priors. Our method adapts
structural restoration dynamically through style-aware temporal scheduling,
enabling high-fidelity reconstructions that honor both subject and style. It
supports multi-round abstraction-aware generation without fine-tuning. To
evaluate this task, we introduce StyleBench, a GPT-based human-aligned metric
suited for abstract styles where pixel-level similarity fails. Experiments
across diverse abstraction (e.g., LEGO, knitted dolls, South Park) show strong
generalization to unseen identities and styles in a fully open-source setup.