Deviation Error: assessing predictions for replicate measurements in genomics and beyond
Journal:
bioRxiv
Published Date:
Jan 1, 2025
Abstract
A quantitative measurement can have variation, referred to here as measurement variation, which is a probability distribution. Machine Learning models typically produce a prediction corresponding to the mode of the measurement variation. The Deviation Error is a novel metric, described here, to assess predictions that accounts for measurement variation. Measurement variations in genomics data were explored. Towards a general prescription for modelling genomics measurements, different loss functions were used to fit models on synthetically generated data that mimics genomics measurements. Synthetically generated data offers the ability to know the true underlying value and to control the forms and amounts of noise injected at different stages of data processing. Different datasets were generated with varying levels of noise. Of the loss functions tried, only models fit with the Deviation Error performed as well if not better on any of the combinations of the metrics and datasets.