3DM-WeConvene: Learned Image Compression with 3D Multi-Level Wavelet-Domain Convolution and Entropy Model
Journal:
arXiv
Published Date:
Apr 7, 2025
Abstract
Learned image compression (LIC) has recently made significant progress,
surpassing traditional methods. However, most LIC approaches operate mainly in
the spatial domain and lack mechanisms for reducing frequency-domain
correlations. To address this, we propose a novel framework that integrates
low-complexity 3D multi-level Discrete Wavelet Transform (DWT) into
convolutional layers and entropy coding, reducing both spatial and channel
correlations to improve frequency selectivity and rate-distortion (R-D)
performance.
Our proposed 3D multi-level wavelet-domain convolution (3DM-WeConv) layer
first applies 3D multi-level DWT (e.g., 5/3 and 9/7 wavelets from JPEG 2000) to
transform data into the wavelet domain. Then, different-sized convolutions are
applied to different frequency subbands, followed by inverse 3D DWT to restore
the spatial domain. The 3DM-WeConv layer can be flexibly used within existing
CNN-based LIC models.
We also introduce a 3D wavelet-domain channel-wise autoregressive entropy
model (3DWeChARM), which performs slice-based entropy coding in the 3D DWT
domain. Low-frequency (LF) slices are encoded first to provide priors for
high-frequency (HF) slices.
A two-step training strategy is adopted: first balancing LF and HF rates,
then fine-tuning with separate weights.
Extensive experiments demonstrate that our framework consistently outperforms
state-of-the-art CNN-based LIC methods in R-D performance and computational
complexity, with larger gains for high-resolution images. On the Kodak, Tecnick
100, and CLIC test sets, our method achieves BD-Rate reductions of -12.24%,
-15.51%, and -12.97%, respectively, compared to H.266/VVC.