Investigation of machine learning algorithms for taxonomic classification of marine metagenomes.

Journal: Microbiology spectrum
PMID:

Abstract

Taxonomic profiling of microbial communities is essential to model microbial interactions and inform habitat conservation. This work develops approaches in constructing training/testing data sets from publicly available marine metagenomes and evaluates the performance of machine learning (ML) approaches in read-based taxonomic classification of marine metagenomes. Predictions from two models are used to test accuracy in metagenomic classification and to guide improvements in ML approaches. Our study provides insights on the methods, results, and challenges of deep learning on marine microbial metagenomic data sets. Future machine learning approaches can be improved by rectifying genome coverage and class imbalance in the training data sets, developing alternative models, and increasing the accessibility of computational resources for model training and refinement.

Authors

  • Helen Park
    Center for Synthetic and Systems Biology, School of Life Sciences, Tsinghua-Peking Center for Life Sciences, Tsinghua University , Beijing, China.
  • Shen Jean Lim
    Cooperative Institute for Marine and Atmospheric Studies, Rosenstiel School of Marine, Atmospheric, and Earth Science, University of Miami , Miami, Florida, USA.
  • Jonathan Cosme
    Run:AI, Office of the CTO , Tel Aviv, Israel.
  • Kyle O'Connell
    Deloitte Consulting LLP, Biomedical Data Science Team , Arlington, Virginia, USA.
  • Jilla Sandeep
    Harte Research Institute, Texas A&M University-Corpus Christi , Corpus Christi, Texas, USA.
  • Felimon Gayanilo
    Harte Research Institute, Texas A&M University-Corpus Christi , Corpus Christi, Texas, USA.
  • George R Cutter
    Southwest Fisheries Science Center, Antarctic Ecosystem Research Division, National Oceanic and Atmospheric Administration , La Jolla, California, USA.
  • Enrique Montes
    Cooperative Institute for Marine and Atmospheric Studies, Rosenstiel School of Marine, Atmospheric, and Earth Science, University of Miami , Miami, Florida, USA.
  • Chotinan Nitikitpaiboon
    Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, The University of Tokyo , Tokyo, Japan.
  • Sam Fisher
    Deloitte Consulting LLP, Biomedical Data Science Team , Arlington, Virginia, USA.
  • Hassan Moustahfid
    NOAA/US Integrated Ocean Observing System (IOOS) , Silver Spring, Maryland, USA.
  • Luke R Thompson
    Ocean Chemistry and Ecosystems Division, Atlantic Oceanographic and Meteorological Laboratory, National Oceanic and Atmospheric Administration , Miami, Florida, USA.