Learning Camera Movement Control from Real-World Drone Videos
Journal:
arXiv
Published Date:
Dec 12, 2024
Abstract
This study seeks to automate camera movement control for filming existing
subjects into attractive videos, contrasting with the creation of non-existent
content by directly generating the pixels. We select drone videos as our test
case due to their rich and challenging motion patterns, distinctive viewing
angles, and precise controls. Existing AI videography methods struggle with
limited appearance diversity in simulation training, high costs of recording
expert operations, and difficulties in designing heuristic-based goals to cover
all scenarios. To avoid these issues, we propose a scalable method that
involves collecting real-world training data to improve diversity, extracting
camera trajectories automatically to minimize annotation costs, and training an
effective architecture that does not rely on heuristics. Specifically, we
collect 99k high-quality trajectories by running 3D reconstruction on online
videos, connecting camera poses from consecutive frames to formulate 3D camera
paths, and using Kalman filter to identify and remove low-quality data.
Moreover, we introduce DVGFormer, an auto-regressive transformer that leverages
the camera path and images from all past frames to predict camera movement in
the next frame. We evaluate our system across 38 synthetic natural scenes and 7
real city 3D scans. We show that our system effectively learns to perform
challenging camera movements such as navigating through obstacles, maintaining
low altitude to increase perceived speed, and orbiting towers and buildings,
which are very useful for recording high-quality videos. Data and code are
available at dvgformer.github.io.