MF-VITON: High-Fidelity Mask-Free Virtual Try-On with Minimal Input
Journal:
arXiv
Published Date:
Mar 11, 2025
Abstract
Recent advancements in Virtual Try-On (VITON) have significantly improved
image realism and garment detail preservation, driven by powerful text-to-image
(T2I) diffusion models. However, existing methods often rely on user-provided
masks, introducing complexity and performance degradation due to imperfect
inputs, as shown in Fig.1(a). To address this, we propose a Mask-Free VITON
(MF-VITON) framework that achieves realistic VITON using only a single person
image and a target garment, eliminating the requirement for auxiliary masks.
Our approach introduces a novel two-stage pipeline: (1) We leverage existing
Mask-based VITON models to synthesize a high-quality dataset. This dataset
contains diverse, realistic pairs of person images and corresponding garments,
augmented with varied backgrounds to mimic real-world scenarios. (2) The
pre-trained Mask-based model is fine-tuned on the generated dataset, enabling
garment transfer without mask dependencies. This stage simplifies the input
requirements while preserving garment texture and shape fidelity. Our framework
achieves state-of-the-art (SOTA) performance regarding garment transfer
accuracy and visual realism. Notably, the proposed Mask-Free model
significantly outperforms existing Mask-based approaches, setting a new
benchmark and demonstrating a substantial lead over previous approaches. For
more details, visit our project page: https://zhenchenwan.github.io/MF-VITON/.