Pleno-Generation: A Scalable Generative Face Video Compression Framework with Bandwidth Intelligence
Journal:
arXiv
Published Date:
Feb 24, 2025
Abstract
Generative model based compact video compression is typically operated within
a relative narrow range of bitrates, and often with an emphasis on ultra-low
rate applications. There has been an increasing consensus in the video
communication industry that full bitrate coverage should be enabled by
generative coding. However, this is an extremely difficult task, largely
because generation and compression, although related, have distinct goals and
trade-offs. The proposed Pleno-Generation (PGen) framework distinguishes itself
through its exceptional capabilities in ensuring the robustness of video coding
by utilizing a wider range of bandwidth for generation via bandwidth
intelligence. In particular, we initiate our research of PGen with face video
coding, and PGen offers a paradigm shift that prioritizes high-fidelity
reconstruction over pursuing compact bitstream. The novel PGen framework
leverages scalable representation and layered reconstruction for Generative
Face Video Compression (GFVC), in an attempt to imbue the bitstream with
intelligence in different granularity. Experimental results illustrate that the
proposed PGen framework can facilitate existing GFVC algorithms to better
deliver high-fidelity and faithful face videos. In addition, the proposed
framework can allow a greater space of flexibility for coding applications and
show superior RD performance with a much wider bitrate range in terms of
various quality evaluations. Moreover, in comparison with the latest Versatile
Video Coding (VVC) codec, the proposed scheme achieves competitive
Bj{\o}ntegaard-delta-rate savings for perceptual-level evaluations.