Denoising Multi-Beta VAE: Representation Learning for Disentanglement and Generation
Journal:
arXiv
Published Date:
Jul 9, 2025
Abstract
Disentangled and interpretable latent representations in generative models
typically come at the cost of generation quality. The $\beta$-VAE framework
introduces a hyperparameter $\beta$ to balance disentanglement and
reconstruction quality, where setting $\beta > 1$ introduces an information
bottleneck that favors disentanglement over sharp, accurate reconstructions. To
address this trade-off, we propose a novel generative modeling framework that
leverages a range of $\beta$ values to learn multiple corresponding latent
representations. First, we obtain a slew of representations by training a
single variational autoencoder (VAE), with a new loss function that controls
the information retained in each latent representation such that the higher
$\beta$ value prioritize disentanglement over reconstruction fidelity. We then,
introduce a non-linear diffusion model that smoothly transitions latent
representations corresponding to different $\beta$ values. This model denoises
towards less disentangled and more informative representations, ultimately
leading to (almost) lossless representations, enabling sharp reconstructions.
Furthermore, our model supports sample generation without input images,
functioning as a standalone generative model. We evaluate our framework in
terms of both disentanglement and generation quality. Additionally, we observe
smooth transitions in the latent spaces with respect to changes in $\beta$,
facilitating consistent manipulation of generated outputs.