Follow Your Motion: A Generic Temporal Consistency Portrait Editing Framework with Trajectory Guidance
Journal:
arXiv
Published Date:
Mar 28, 2025
Abstract
Pre-trained conditional diffusion models have demonstrated remarkable
potential in image editing. However, they often face challenges with temporal
consistency, particularly in the talking head domain, where continuous changes
in facial expressions intensify the level of difficulty. These issues stem from
the independent editing of individual images and the inherent loss of temporal
continuity during the editing process. In this paper, we introduce Follow Your
Motion (FYM), a generic framework for maintaining temporal consistency in
portrait editing. Specifically, given portrait images rendered by a pre-trained
3D Gaussian Splatting model, we first develop a diffusion model that
intuitively and inherently learns motion trajectory changes at different scales
and pixel coordinates, from the first frame to each subsequent frame. This
approach ensures that temporally inconsistent edited avatars inherit the motion
information from the rendered avatars. Secondly, to maintain fine-grained
expression temporal consistency in talking head editing, we propose a dynamic
re-weighted attention mechanism. This mechanism assigns higher weight
coefficients to landmark points in space and dynamically updates these weights
based on landmark loss, achieving more consistent and refined facial
expressions. Extensive experiments demonstrate that our method outperforms
existing approaches in terms of temporal consistency and can be used to
optimize and compensate for temporally inconsistent outputs in a range of
applications, such as text-driven editing, relighting, and various other
applications.