MFP-VTON: Enhancing Mask-Free Person-to-Person Virtual Try-On via Diffusion Transformer
Journal:
arXiv
Published Date:
Feb 3, 2025
Abstract
The garment-to-person virtual try-on (VTON) task, which aims to generate
fitting images of a person wearing a reference garment, has made significant
strides. However, obtaining a standard garment is often more challenging than
using the garment already worn by the person. To improve ease of use, we
propose MFP-VTON, a Mask-Free framework for Person-to-Person VTON. Recognizing
the scarcity of person-to-person data, we adapt a garment-to-person model and
dataset to construct a specialized dataset for this task. Our approach builds
upon a pretrained diffusion transformer, leveraging its strong generative
capabilities. During mask-free model fine-tuning, we introduce a Focus
Attention loss to emphasize the garment of the reference person and the details
outside the garment of the target person. Experimental results demonstrate that
our model excels in both person-to-person and garment-to-person VTON tasks,
generating high-fidelity fitting images.