Leveraging permutation testing to assess confidence in positive-unlabeled learning applied to high-dimensional biological datasets.

Journal: BMC bioinformatics
Published Date:

Abstract

BACKGROUND: Compared to traditional supervised machine learning approaches employing fully labeled samples, positive-unlabeled (PU) learning techniques aim to classify "unlabeled" samples based on a smaller proportion of known positive examples. This more challenging modeling goal reflects many real-world scenarios in which negative examples are not available-posing direct challenges to defining prediction accuracy and robustness. While several studies have evaluated predictions learned from only definitive positive examples, few have investigated whether correct classification of a high proportion of known positives (KP) samples from among unlabeled samples can act as a surrogate to indicate model quality.

Authors

  • Shiwei Xu
    Research Center for Agricultural Monitoring and Early Warning, Agricultural Information Institute of Chinese Academy of Agricultural Sciences, Beijing, China.
  • Margaret E Ackerman
    Thayer School of Engineering, Dartmouth College, Hanover, New Hampshire, United States of America.