Positive-Unlabelled learning for identifying new candidate Dietary Restriction-related genes among ageing-related genes.

Journal: Computers in biology and medicine
PMID:

Abstract

Dietary Restriction (DR) is one of the most popular anti-ageing interventions; recently, Machine Learning (ML) has been explored to identify potential DR-related genes among ageing-related genes, aiming to minimize costly wet lab experiments needed to expand our knowledge on DR. However, to train a model from positive (DR-related) and negative (non-DR-related) examples, the existing ML approach naively labels genes without known DR relation as negative examples, assuming that lack of DR-related annotation for a gene represents evidence of absence of DR-relatedness, rather than absence of evidence. This hinders the reliability of the negative examples (non-DR-related genes) and the method's ability to identify novel DR-related genes. This work introduces a novel gene prioritization method based on the two-step Positive-Unlabelled (PU) Learning paradigm: using a similarity-based, KNN-inspired approach, our method first selects reliable negative examples among the genes without known DR associations. Then, these reliable negatives and all known positives are used to train a classifier that effectively differentiates DR-related and non-DR-related genes, which is finally employed to generate a more reliable ranking of promising genes for novel DR-relatedness. Our method significantly outperforms (p<0.05) the existing state-of-the-art approach in three predictive accuracy metrics with up to ∼40% lower computational cost in the best case, and we identify 4 new promising DR-related genes (PRKAB1, PRKAB2, IRS2, PRKAG1), all with evidence from the existing literature supporting their potential DR-related role.

Authors

  • Jorge Paz-Ruza
    LIDIA Group, CITIC, Universidade da Coruña, Campus de Elviña s/n, A Coruña 15071, Spain. Electronic address: j.ruza@udc.es.
  • Alex A Freitas
    School of Computing, University of Kent, Canterbury, Kent, CT2 7NF, UK.
  • Amparo Alonso-Betanzos
    Department of Computer Science, University of A Coruña, A Coruña, Spain. ciamparo@udc.es.
  • Bertha Guijarro-Berdiñas
    LIDIA Group, CITIC, Universidade da Coruña, Campus de Elviña s/n, A Coruña 15071, Spain. Electronic address: berta.guijarro@udc.es.