Conditional Latent Coding with Learnable Synthesized Reference for Deep Image Compression
Journal:
arXiv
Published Date:
Feb 14, 2025
Abstract
In this paper, we study how to synthesize a dynamic reference from an
external dictionary to perform conditional coding of the input image in the
latent domain and how to learn the conditional latent synthesis and coding
modules in an end-to-end manner. Our approach begins by constructing a
universal image feature dictionary using a multi-stage approach involving
modified spatial pyramid pooling, dimension reduction, and multi-scale feature
clustering. For each input image, we learn to synthesize a conditioning latent
by selecting and synthesizing relevant features from the dictionary, which
significantly enhances the model's capability in capturing and exploring image
source correlation. This conditional latent synthesis involves a
correlation-based feature matching and alignment strategy, comprising a
Conditional Latent Matching (CLM) module and a Conditional Latent Synthesis
(CLS) module. The synthesized latent is then used to guide the encoding
process, allowing for more efficient compression by exploiting the correlation
between the input image and the reference dictionary. According to our
theoretical analysis, the proposed conditional latent coding (CLC) method is
robust to perturbations in the external dictionary samples and the selected
conditioning latent, with an error bound that scales logarithmically with the
dictionary size, ensuring stability even with large and diverse dictionaries.
Experimental results on benchmark datasets show that our new method improves
the coding performance by a large margin (up to 1.2 dB) with a very small
overhead of approximately 0.5\% bits per pixel. Our code is publicly available
at https://github.com/ydchen0806/CLC.