T$^3$-S2S: Training-free Triplet Tuning for Sketch to Scene Generation
Journal:
arXiv
Published Date:
Dec 18, 2024
Abstract
Scene generation is crucial to many computer graphics applications. Recent
advances in generative AI have streamlined sketch-to-image workflows, easing
the workload for artists and designers in creating scene concept art. However,
these methods often struggle for complex scenes with multiple detailed objects,
sometimes missing small or uncommon instances. In this paper, we propose a
Training-free Triplet Tuning for Sketch-to-Scene (T3-S2S) generation after
reviewing the entire cross-attention mechanism. This scheme revitalizes the
existing ControlNet model, enabling effective handling of multi-instance
generations, involving prompt balance, characteristics prominence, and dense
tuning. Specifically, this approach enhances keyword representation via the
prompt balance module, reducing the risk of missing critical instances. It also
includes a characteristics prominence module that highlights TopK indices in
each channel, ensuring essential features are better represented based on token
sketches. Additionally, it employs dense tuning to refine contour details in
the attention map, compensating for instance-related regions. Experiments
validate that our triplet tuning approach substantially improves the
performance of existing sketch-to-image models. It consistently generates
detailed, multi-instance 2D images, closely adhering to the input prompts and
enhancing visual quality in complex multi-instance scenes. Code is available at
https://github.com/chaos-sun/t3s2s.git.