Using Machine Learning to Measure Relatedness Between Genes: A Multi-Features Model.

Journal: Scientific reports
Published Date:

Abstract

Measuring conditional relatedness between a pair of genes is a fundamental technique and still a significant challenge in computational biology. Such relatedness can be assessed by gene expression similarities while suffering high false discovery rates. Meanwhile, other types of features, e.g., prior-knowledge based similarities, is only viable for measuring global relatedness. In this paper, we propose a novel machine learning model, named Multi-Features Relatedness (MFR), for accurately measuring conditional relatedness between a pair of genes by incorporating expression similarities with prior-knowledge based similarities in an assessment criterion. MFR is used to predict gene-gene interactions extracted from the COXPRESdb, KEGG, HPRD, and TRRUST databases by the 10-fold cross validation and test verification, and to identify gene-gene interactions collected from the GeneFriends and DIP databases for further verification. The results show that MFR achieves the highest area under curve (AUC) values for identifying gene-gene interactions in the development, test, and DIP datasets. Specifically, it obtains an improvement of 1.1% on average of precision for detecting gene pairs with both high expression similarities and high prior-knowledge based similarities in all datasets, comparing to other linear models and coexpression analysis methods. Regarding cancer gene networks construction and gene function prediction, MFR also obtains the results with more biological significances and higher average prediction accuracy, than other compared models and methods. A website of the MFR model and relevant datasets can be accessed from http://bmbl.sdstate.edu/MFR .

Authors

  • Yan Wang
    College of Animal Science and Technology, Beijing University of Agriculture, Beijing, China.
  • Sen Yang
    Key Laboratory of Symbol Computation and Knowledge Engineering of Ministry of Education, College of Computer Science and Technology, Jilin University, Changchun, 130012, China.
  • Jing Zhao
    Department of Pharmacy, Pharmacoepidemiology and Drug Safety Research Group, Faculty of Mathematics and Natural Sciences, University of Oslo, Oslo, Norway.
  • Wei Du
    Department of Respiratory and Critical Care Medicine, Ruijin Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai 200025, China.
  • Yanchun Liang
    * College of Computer Science and Technology, Key Laboratory of Symbolic, Computation and Knowledge, Engineering of Ministry of Education, Jilin University, Changchun 130012, P. R. China.
  • Cankun Wang
    Bioinformatics and Mathematical Biosciences Lab, Department of Agronomy, Horticulture, and Plant Science, Department of Mathematics and Statistics, South Dakota State University, Brookings, SD, 57006, USA.
  • Fengfeng Zhou
  • Yuan Tian
    Department of Geriatrics, Union Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, Hubei, China.
  • Qin Ma
    Computational Systems Biology Lab, Department of Biochemistry and Molecular Biology, and Institute of Bioinformatics, University of Georgia, GA 30602, USA BioEnergy Science Center, TN 37831, USA.