Training-Free Motion-Guided Video Generation with Enhanced Temporal Consistency Using Motion Consistency Loss
Journal:
arXiv
Published Date:
Jan 13, 2025
Abstract
In this paper, we address the challenge of generating temporally consistent
videos with motion guidance. While many existing methods depend on additional
control modules or inference-time fine-tuning, recent studies suggest that
effective motion guidance is achievable without altering the model architecture
or requiring extra training. Such approaches offer promising compatibility with
various video generation foundation models. However, existing training-free
methods often struggle to maintain consistent temporal coherence across frames
or to follow guided motion accurately. In this work, we propose a simple yet
effective solution that combines an initial-noise-based approach with a novel
motion consistency loss, the latter being our key innovation. Specifically, we
capture the inter-frame feature correlation patterns of intermediate features
from a video diffusion model to represent the motion pattern of the reference
video. We then design a motion consistency loss to maintain similar feature
correlation patterns in the generated video, using the gradient of this loss in
the latent space to guide the generation process for precise motion control.
This approach improves temporal consistency across various motion control tasks
while preserving the benefits of a training-free setup. Extensive experiments
show that our method sets a new standard for efficient, temporally coherent
video generation.