Estimating Dataset Dimension via Singular Metrics under the Manifold Hypothesis: Application to Inverse Problems
Journal:
arXiv
Published Date:
Jul 9, 2025
Abstract
High-dimensional datasets often exhibit low-dimensional geometric structures,
as suggested by the manifold hypothesis, which implies that data lie on a
smooth manifold embedded in a higher-dimensional ambient space. While this
insight underpins many advances in machine learning and inverse problems, fully
leveraging it requires to deal with three key tasks: estimating the intrinsic
dimension (ID) of the manifold, constructing appropriate local coordinates, and
learning mappings between ambient and manifold spaces. In this work, we propose
a framework that addresses all these challenges using a Mixture of Variational
Autoencoders (VAEs) and tools from Riemannian geometry. We specifically focus
on estimating the ID of datasets by analyzing the numerical rank of the VAE
decoder pullback metric. The estimated ID guides the construction of an atlas
of local charts using a mixture of invertible VAEs, enabling accurate manifold
parameterization and efficient inference. We how this approach enhances
solutions to ill-posed inverse problems, particularly in biomedical imaging, by
enforcing that reconstructions lie on the learned manifold. Lastly, we explore
the impact of network pruning on manifold geometry and reconstruction quality,
showing that the intrinsic dimension serves as an effective proxy for
monitoring model capacity.