PainterNet: Adaptive Image Inpainting with Actual-Token Attention and Diverse Mask Control
Journal:
arXiv
Published Date:
Dec 2, 2024
Abstract
Recently, diffusion models have exhibited superior performance in the area of
image inpainting. Inpainting methods based on diffusion models can usually
generate realistic, high-quality image content for masked areas. However, due
to the limitations of diffusion models, existing methods typically encounter
problems in terms of semantic consistency between images and text, and the
editing habits of users. To address these issues, we present PainterNet, a
plugin that can be flexibly embedded into various diffusion models. To generate
image content in the masked areas that highly aligns with the user input
prompt, we proposed local prompt input, Attention Control Points (ACP), and
Actual-Token Attention Loss (ATAL) to enhance the model's focus on local areas.
Additionally, we redesigned the MASK generation algorithm in training and
testing dataset to simulate the user's habit of applying MASK, and introduced a
customized new training dataset, PainterData, and a benchmark dataset,
PainterBench. Our extensive experimental analysis exhibits that PainterNet
surpasses existing state-of-the-art models in key metrics including image
quality and global/local text consistency.