DCEdit: Dual-Level Controlled Image Editing via Precisely Localized Semantics
Journal:
arXiv
Published Date:
Mar 21, 2025
Abstract
This paper presents a novel approach to improving text-guided image editing
using diffusion-based models. Text-guided image editing task poses key
challenge of precisly locate and edit the target semantic, and previous methods
fall shorts in this aspect. Our method introduces a Precise Semantic
Localization strategy that leverages visual and textual self-attention to
enhance the cross-attention map, which can serve as a regional cues to improve
editing performance. Then we propose a Dual-Level Control mechanism for
incorporating regional cues at both feature and latent levels, offering
fine-grained control for more precise edits. To fully compare our methods with
other DiT-based approaches, we construct the RW-800 benchmark, featuring high
resolution images, long descriptive texts, real-world images, and a new text
editing task. Experimental results on the popular PIE-Bench and RW-800
benchmarks demonstrate the superior performance of our approach in preserving
background and providing accurate edits.