PromptDresser: Improving the Quality and Controllability of Virtual Try-On via Generative Textual Prompt and Prompt-aware Mask
Journal:
arXiv
Published Date:
Dec 22, 2024
Abstract
Recent virtual try-on approaches have advanced by fine-tuning the pre-trained
text-to-image diffusion models to leverage their powerful generative ability.
However, the use of text prompts in virtual try-on is still underexplored. This
paper tackles a text-editable virtual try-on task that changes the clothing
item based on the provided clothing image while editing the wearing style
(e.g., tucking style, fit) according to the text descriptions. In the
text-editable virtual try-on, three key aspects exist: (i) designing rich text
descriptions for paired person-clothing data to train the model, (ii)
addressing the conflicts where textual information of the existing person's
clothing interferes the generation of the new clothing, and (iii) adaptively
adjust the inpainting mask aligned with the text descriptions, ensuring proper
editing areas while preserving the original person's appearance irrelevant to
the new clothing. To address these aspects, we propose PromptDresser, a
text-editable virtual try-on model that leverages large multimodal model (LMM)
assistance to enable high-quality and versatile manipulation based on
generative text prompts. Our approach utilizes LMMs via in-context learning to
generate detailed text descriptions for person and clothing images
independently, including pose details and editing attributes using minimal
human cost. Moreover, to ensure the editing areas, we adjust the inpainting
mask depending on the text prompts adaptively. We found that our approach,
utilizing detailed text prompts, not only enhances text editability but also
effectively conveys clothing details that are difficult to capture through
images alone, thereby enhancing image quality. Our code is available at
https://github.com/rlawjdghek/PromptDresser.