List-Level Distribution Coupling with Applications to Speculative Decoding and Lossy Compression
Journal:
arXiv
Published Date:
Jun 5, 2025
Abstract
We study a relaxation of the problem of coupling probability distributions --
a list of samples is generated from one distribution and an accept is declared
if any one of these samples is identical to the sample generated from the other
distribution. We propose a novel method for generating samples, which extends
the Gumbel-max sampling suggested in Daliri et al. (arXiv:2408.07978) for
coupling probability distributions. We also establish a corresponding lower
bound on the acceptance probability, which we call the list matching lemma. We
next discuss two applications of our setup. First, we develop a new mechanism
for multi-draft speculative sampling that is simple to implement and achieves
performance competitive with baselines such as SpecTr and SpecInfer across a
range of language tasks. Our method also guarantees a certain degree of drafter
invariance with respect to the output tokens which is not supported by existing
schemes. We also provide a theoretical lower bound on the token level
acceptance probability. As our second application, we consider distributed
lossy compression with side information in a setting where a source sample is
compressed and available to multiple decoders, each with independent side
information. We propose a compression technique that is based on our
generalization of Gumbel-max sampling and show that it provides significant
gains in experiments involving synthetic Gaussian sources and the MNIST image
dataset.