Prediction of virus-host infectious association by supervised learning methods.

Journal: BMC bioinformatics
Published Date:

Abstract

BACKGROUND: The study of virus-host infectious association is important for understanding the functions and dynamics of microbial communities. Both cellular and fractionated viral metagenomic data generate a large number of viral contigs with missing host information. Although relative simple methods based on the similarity between the word frequency vectors of viruses and bacterial hosts have been developed to study virus-host associations, the problem is significantly understudied. We hypothesize that machine learning methods based on word frequencies can be efficiently used to study virus-host infectious associations.

Authors

  • Mengge Zhang
    Molecular and Computational Biology Program, University of Southern California, Los Angeles, California, USA.
  • Lianping Yang
    College of Sciences, Northeastern University, Shenyang, China.
  • Jie Ren
    Digital Clinical Measures, Translational Medicine, Merck & Co., Inc., Rahway, NJ, United States.
  • Nathan A Ahlgren
    Department of Biological Sciences and Wrigley Institute for Environmental Studies, University of Southern California, Los Angeles, California, USA.
  • Jed A Fuhrman
    Department of Biological Sciences and Wrigley Institute for Environmental Studies, University of Southern California, Los Angeles, California, USA.
  • Fengzhu Sun
    Molecular and Computational Biology Program, University of Southern California, Los Angeles, California, USA. fsun@usc.edu.