Empirical Privacy Variance
Journal:
arXiv
Published Date:
Mar 16, 2025
Abstract
We propose the notion of empirical privacy variance and study it in the
context of differentially private fine-tuning of language models. Specifically,
we show that models calibrated to the same $(\varepsilon, \delta)$-DP guarantee
using DP-SGD with different hyperparameter configurations can exhibit
significant variations in empirical privacy, which we quantify through the lens
of memorization. We investigate the generality of this phenomenon across
multiple dimensions and discuss why it is surprising and relevant. Through
regression analysis, we examine how individual and composite hyperparameters
influence empirical privacy. The results reveal a no-free-lunch trade-off:
existing practices of hyperparameter tuning in DP-SGD, which focus on
optimizing utility under a fixed privacy budget, often come at the expense of
empirical privacy. To address this, we propose refined heuristics for
hyperparameter selection that explicitly account for empirical privacy, showing
that they are both precise and practically useful. Finally, we take preliminary
steps to understand empirical privacy variance. We propose two hypotheses,
identify limitations in existing techniques like privacy auditing, and outline
open questions for future research.