T-GVC: Trajectory-Guided Generative Video Coding at Ultra-Low Bitrates
Journal:
arXiv
Published Date:
Jul 10, 2025
Abstract
Recent advances in video generation techniques have given rise to an emerging
paradigm of generative video coding, aiming to achieve semantically accurate
reconstructions in Ultra-Low Bitrate (ULB) scenarios by leveraging strong
generative priors. However, most existing methods are limited by domain
specificity (e.g., facial or human videos) or an excessive dependence on
high-level text guidance, which often fails to capture motion details and
results in unrealistic reconstructions. To address these challenges, we propose
a Trajectory-Guided Generative Video Coding framework (dubbed T-GVC). T-GVC
employs a semantic-aware sparse motion sampling pipeline to effectively bridge
low-level motion tracking with high-level semantic understanding by extracting
pixel-wise motion as sparse trajectory points based on their semantic
importance, not only significantly reducing the bitrate but also preserving
critical temporal semantic information. In addition, by incorporating
trajectory-aligned loss constraints into diffusion processes, we introduce a
training-free latent space guidance mechanism to ensure physically plausible
motion patterns without sacrificing the inherent capabilities of generative
models. Experimental results demonstrate that our framework outperforms both
traditional codecs and state-of-the-art end-to-end video compression methods
under ULB conditions. Furthermore, additional experiments confirm that our
approach achieves more precise motion control than existing text-guided
methods, paving the way for a novel direction of generative video coding guided
by geometric motion modeling.