phepy: Visual Benchmarks and Improvements for Out-of-Distribution Detectors
Journal:
arXiv
Published Date:
Mar 7, 2025
Abstract
Applying machine learning to increasingly high-dimensional problems with
sparse or biased training data increases the risk that a model is used on
inputs outside its training domain. For such out-of-distribution (OOD) inputs,
the model can no longer make valid predictions, and its error is potentially
unbounded.
Testing OOD detection methods on real-world datasets is complicated by the
ambiguity around which inputs are in-distribution (ID) or OOD. We design a
benchmark for OOD detection, which includes three novel and easily-visualisable
toy examples. These simple examples provide direct and intuitive insight into
whether the detector is able to detect (1) linear and (2) non-linear concepts
and (3) identify thin ID subspaces (needles) within high-dimensional spaces
(haystacks). We use our benchmark to evaluate the performance of various
methods from the literature.
Since tactile examples of OOD inputs may benefit OOD detection, we also
review several simple methods to synthesise OOD inputs for supervised training.
We introduce two improvements, $t$-poking and OOD sample weighting, to make
supervised detectors more precise at the ID-OOD boundary. This is especially
important when conflicts between real ID and synthetic OOD sample blur the
decision boundary.
Finally, we provide recommendations for constructing and applying
out-of-distribution detectors in machine learning.