GIViC: Generative Implicit Video Compression
Journal:
arXiv
Published Date:
Mar 25, 2025
Abstract
While video compression based on implicit neural representations (INRs) has
recently demonstrated great potential, existing INR-based video codecs still
cannot achieve state-of-the-art (SOTA) performance compared to their
conventional or autoencoder-based counterparts given the same coding
configuration. In this context, we propose a Generative Implicit Video
Compression framework, GIViC, aiming at advancing the performance limits of
this type of coding methods. GIViC is inspired by the characteristics that INRs
share with large language and diffusion models in exploiting long-term
dependencies. Through the newly designed implicit diffusion process, GIViC
performs diffusive sampling across coarse-to-fine spatiotemporal
decompositions, gradually progressing from coarser-grained full-sequence
diffusion to finer-grained per-token diffusion. A novel Hierarchical Gated
Linear Attention-based transformer (HGLA), is also integrated into the
framework, which dual-factorizes global dependency modeling along scale and
sequential axes. The proposed GIViC model has been benchmarked against SOTA
conventional and neural codecs using a Random Access (RA) configuration (YUV
4:2:0, GOPSize=32), and yields BD-rate savings of 15.94%, 22.46% and 8.52% over
VVC VTM, DCVC-FM and NVRC, respectively. As far as we are aware, GIViC is the
first INR-based video codec that outperforms VTM based on the RA coding
configuration. The source code will be made available.