Advanced Tutorial: Label-Efficient Two-Sample Tests
Journal:
arXiv
Published Date:
Jan 7, 2025
Abstract
Hypothesis testing is a statistical inference approach used to determine
whether data supports a specific hypothesis. An important type is the
two-sample test, which evaluates whether two sets of data points are from
identical distributions. This test is widely used, such as by clinical
researchers comparing treatment effectiveness. This tutorial explores
two-sample testing in a context where an analyst has many features from two
samples, but determining the sample membership (or labels) of these features is
costly. In machine learning, a similar scenario is studied in active learning.
This tutorial extends active learning concepts to two-sample testing within
this \textit{label-costly} setting while maintaining statistical validity and
high testing power. Additionally, the tutorial discusses practical applications
of these label-efficient two-sample tests.