AssetDropper: Asset Extraction via Diffusion Models with Reward-Driven Optimization
Journal:
arXiv
Published Date:
Jun 6, 2025
Abstract
Recent research on generative models has primarily focused on creating
product-ready visual outputs; however, designers often favor access to
standardized asset libraries, a domain that has yet to be significantly
enhanced by generative capabilities. Although open-world scenes provide ample
raw materials for designers, efficiently extracting high-quality, standardized
assets remains a challenge. To address this, we introduce AssetDropper, the
first framework designed to extract assets from reference images, providing
artists with an open-world asset palette. Our model adeptly extracts a front
view of selected subjects from input images, effectively handling complex
scenarios such as perspective distortion and subject occlusion. We establish a
synthetic dataset of more than 200,000 image-subject pairs and a real-world
benchmark with thousands more for evaluation, facilitating the exploration of
future research in downstream tasks. Furthermore, to ensure precise asset
extraction that aligns well with the image prompts, we employ a pre-trained
reward model to fulfill a closed-loop with feedback. We design the reward model
to perform an inverse task that pastes the extracted assets back into the
reference sources, which assists training with additional consistency and
mitigates hallucination. Extensive experiments show that, with the aid of
reward-driven optimization, AssetDropper achieves the state-of-the-art results
in asset extraction. Project page: AssetDropper.github.io.