Beyond Matryoshka: Revisiting Sparse Coding for Adaptive Representation
Journal:
arXiv
Published Date:
Mar 3, 2025
Abstract
Many large-scale systems rely on high-quality deep representations
(embeddings) to facilitate tasks like retrieval, search, and generative
modeling. Matryoshka Representation Learning (MRL) recently emerged as a
solution for adaptive embedding lengths, but it requires full model retraining
and suffers from noticeable performance degradations at short lengths. In this
paper, we show that sparse coding offers a compelling alternative for achieving
adaptive representation with minimal overhead and higher fidelity. We propose
Contrastive Sparse Representation (CSR), a method that sparsifies pre-trained
embeddings into a high-dimensional but selectively activated feature space. By
leveraging lightweight autoencoding and task-aware contrastive objectives, CSR
preserves semantic quality while allowing flexible, cost-effective inference at
different sparsity levels. Extensive experiments on image, text, and multimodal
benchmarks demonstrate that CSR consistently outperforms MRL in terms of both
accuracy and retrieval speed-often by large margins-while also cutting training
time to a fraction of that required by MRL. Our results establish sparse coding
as a powerful paradigm for adaptive representation learning in real-world
applications where efficiency and fidelity are both paramount. Code is
available at https://github.com/neilwen987/CSR_Adaptive_Rep