Leveraging permutation testing to assess confidence in positive-unlabeled learning applied to high-dimensional biological datasets.
Journal:
BMC bioinformatics
Published Date:
Jun 19, 2024
Abstract
BACKGROUND: Compared to traditional supervised machine learning approaches employing fully labeled samples, positive-unlabeled (PU) learning techniques aim to classify "unlabeled" samples based on a smaller proportion of known positive examples. This more challenging modeling goal reflects many real-world scenarios in which negative examples are not available-posing direct challenges to defining prediction accuracy and robustness. While several studies have evaluated predictions learned from only definitive positive examples, few have investigated whether correct classification of a high proportion of known positives (KP) samples from among unlabeled samples can act as a surrogate to indicate model quality.