Efficient and Stable Multi-Dimensional Kolmogorov-Smirnov Distance
Journal:
arXiv
Published Date:
Apr 15, 2025
Abstract
We revisit extending the Kolmogorov-Smirnov distance between probability
distributions to the multidimensional setting and make new arguments about the
proper way to approach this generalization. Our proposed formulation maximizes
the difference over orthogonal dominating rectangular ranges (d-sided
rectangles in R^d), and is an integral probability metric. We also prove that
the distance between a distribution and a sample from the distribution
converges to 0 as the sample size grows, and bound this rate. Moreover, we show
that one can, up to this same approximation error, compute the distance
efficiently in 4 or fewer dimensions; specifically the runtime is near-linear
in the size of the sample needed for that error. With this, we derive a
delta-precision two-sample hypothesis test using this distance. Finally, we
show these metric and approximation properties do not hold for other popular
variants.