PopPhy-CNN: A Phylogenetic Tree Embedded Architecture for Convolutional Neural Networks to Predict Host Phenotype From Metagenomic Data.

Journal: IEEE journal of biomedical and health informatics
Published Date:

Abstract

Accurate prediction of the host phenotype from a metagenomic sample and identification of the associated microbial markers are important in understanding potential host-microbiome interactions related to disease initiation and progression. We introduce PopPhy-CNN, a novel convolutional neural network (CNN) learning framework that effectively exploits phylogenetic structure in microbial taxa for host phenotype prediction. Our approach takes an input format of a 2D matrix representing the phylogenetic tree populated with the relative abundance of microbial taxa in a metagenomic sample. This conversion empowers CNNs to explore the spatial relationship of the taxonomic annotations on the tree and their quantitative characteristics in metagenomic data. We show the competitiveness of our model compared to other available methods using nine metagenomic datasets of moderate size for binary classification. With synthetic and biological datasets, we show the superior and robust performance of our model for multi-class classification. Furthermore, we design a novel scheme for feature extraction from the learned CNN models and demonstrate improved performance when the extracted features. PopPhy-CNN is a practical deep learning framework for the prediction of host phenotype with the ability of facilitating the retrieval of predictive microbial taxa.

Authors

  • Derek Reiman
  • Ahmed A Metwally
  • Jun Sun
    School of Environmental and Chemical Engineering, Jiangsu University of Science and Technology, Zhenjiang 212003, Jiangsu Province, PR China.
  • Yang Dai
    Institute of Cardiovascular Diseases, Shanghai Jiao Tong University School of Medicine, 197 Ruijin 2nd Road, Shanghai, 200025, PR China.