Dynamic View Synthesis as an Inverse Problem
Journal:
arXiv
Published Date:
Jun 9, 2025
Abstract
In this work, we address dynamic view synthesis from monocular videos as an
inverse problem in a training-free setting. By redesigning the noise
initialization phase of a pre-trained video diffusion model, we enable
high-fidelity dynamic view synthesis without any weight updates or auxiliary
modules. We begin by identifying a fundamental obstacle to deterministic
inversion arising from zero-terminal signal-to-noise ratio (SNR) schedules and
resolve it by introducing a novel noise representation, termed K-order
Recursive Noise Representation. We derive a closed form expression for this
representation, enabling precise and efficient alignment between the
VAE-encoded and the DDIM inverted latents. To synthesize newly visible regions
resulting from camera motion, we introduce Stochastic Latent Modulation, which
performs visibility aware sampling over the latent space to complete occluded
regions. Comprehensive experiments demonstrate that dynamic view synthesis can
be effectively performed through structured latent manipulation in the noise
initialization phase.