MotionDiff: Training-free Zero-shot Interactive Motion Editing via Flow-assisted Multi-view Diffusion
Journal:
arXiv
Published Date:
Mar 22, 2025
Abstract
Generative models have made remarkable advancements and are capable of
producing high-quality content. However, performing controllable editing with
generative models remains challenging, due to their inherent uncertainty in
outputs. This challenge is praticularly pronounced in motion editing, which
involves the processing of spatial information. While some physics-based
generative methods have attempted to implement motion editing, they typically
operate on single-view images with simple motions, such as translation and
dragging. These methods struggle to handle complex rotation and stretching
motions and ensure multi-view consistency, often necessitating
resource-intensive retraining. To address these challenges, we propose
MotionDiff, a training-free zero-shot diffusion method that leverages optical
flow for complex multi-view motion editing. Specifically, given a static scene,
users can interactively select objects of interest to add motion priors. The
proposed Point Kinematic Model (PKM) then estimates corresponding multi-view
optical flows during the Multi-view Flow Estimation Stage (MFES). Subsequently,
these optical flows are utilized to generate multi-view motion results through
decoupled motion representation in the Multi-view Motion Diffusion Stage
(MMDS). Extensive experiments demonstrate that MotionDiff outperforms other
physics-based generative motion editing methods in achieving high-quality
multi-view consistent motion results. Notably, MotionDiff does not require
retraining, enabling users to conveniently adapt it for various down-stream
tasks.