Face-MakeUp: Multimodal Facial Prompts for Text-to-Image Generation
Journal:
arXiv
Published Date:
Jan 5, 2025
Abstract
Facial images have extensive practical applications. Although the current
large-scale text-image diffusion models exhibit strong generation capabilities,
it is challenging to generate the desired facial images using only text prompt.
Image prompts are a logical choice. However, current methods of this type
generally focus on general domain. In this paper, we aim to optimize image
makeup techniques to generate the desired facial images. Specifically, (1) we
built a dataset of 4 million high-quality face image-text pairs
(FaceCaptionHQ-4M) based on LAION-Face to train our Face-MakeUp model; (2) to
maintain consistency with the reference facial image, we extract/learn
multi-scale content features and pose features for the facial image,
integrating these into the diffusion model to enhance the preservation of
facial identity features for diffusion models. Validation on two face-related
test datasets demonstrates that our Face-MakeUp can achieve the best
comprehensive performance.All codes are available
at:https://github.com/ddw2AIGROUP2CQUPT/Face-MakeUp