MotionBridge: Dynamic Video Inbetweening with Flexible Controls
Journal:
arXiv
Published Date:
Dec 17, 2024
Abstract
By generating plausible and smooth transitions between two image frames,
video inbetweening is an essential tool for video editing and long video
synthesis. Traditional works lack the capability to generate complex large
motions. While recent video generation techniques are powerful in creating
high-quality results, they often lack fine control over the details of
intermediate frames, which can lead to results that do not align with the
creative mind. We introduce MotionBridge, a unified video inbetweening
framework that allows flexible controls, including trajectory strokes,
keyframes, masks, guide pixels, and text. However, learning such multi-modal
controls in a unified framework is a challenging task. We thus design two
generators to extract the control signal faithfully and encode feature through
dual-branch embedders to resolve ambiguities. We further introduce a curriculum
training strategy to smoothly learn various controls. Extensive qualitative and
quantitative experiments have demonstrated that such multi-modal controls
enable a more dynamic, customizable, and contextually accurate visual
narrative.