A Cramér-von Mises Approach to Incentivizing Truthful Data Sharing
Journal:
arXiv
Published Date:
Jun 8, 2025
Abstract
Modern data marketplaces and data sharing consortia increasingly rely on
incentive mechanisms to encourage agents to contribute data. However, schemes
that reward agents based on the quantity of submitted data are vulnerable to
manipulation, as agents may submit fabricated or low-quality data to inflate
their rewards. Prior work has proposed comparing each agent's data against
others' to promote honesty: when others contribute genuine data, the best way
to minimize discrepancy is to do the same. Yet prior implementations of this
idea rely on very strong assumptions about the data distribution (e.g.
Gaussian), limiting their applicability. In this work, we develop reward
mechanisms based on a novel, two-sample test inspired by the Cram\'er-von Mises
statistic. Our methods strictly incentivize agents to submit more genuine data,
while disincentivizing data fabrication and other types of untruthful
reporting. We establish that truthful reporting constitutes a (possibly
approximate) Nash equilibrium in both Bayesian and prior-agnostic settings. We
theoretically instantiate our method in three canonical data sharing problems
and show that it relaxes key assumptions made by prior work. Empirically, we
demonstrate that our mechanism incentivizes truthful data sharing via
simulations and on real-world language and image data.