Machine Learning Strategy for Gut Microbiome-Based Diagnostic Screening of Cardiovascular Disease.

Journal: Hypertension (Dallas, Tex. : 1979)
Published Date:

Abstract

Cardiovascular disease (CVD) is the number one leading cause for human mortality. Besides genetics and environmental factors, in recent years, gut microbiota has emerged as a new factor influencing CVD. Although cause-effect relationships are not clearly established, the reported associations between alterations in gut microbiota and CVD are prominent. Therefore, we hypothesized that machine learning (ML) could be used for gut microbiome-based diagnostic screening of CVD. To test our hypothesis, fecal 16S ribosomal RNA sequencing data of 478 CVD and 473 non-CVD human subjects collected through the American Gut Project were analyzed using 5 supervised ML algorithms including random forest, support vector machine, decision tree, elastic net, and neural networks. Thirty-nine differential bacterial taxa were identified between the CVD and non-CVD groups. ML modeling using these taxonomic features achieved a testing area under the receiver operating characteristic curve (0.0, perfect antidiscrimination; 0.5, random guessing; 1.0, perfect discrimination) of ≈0.58 (random forest and neural networks). Next, the ML models were trained with the top 500 high-variance features of operational taxonomic units, instead of bacterial taxa, and an improved testing area under the receiver operating characteristic curves of ≈0.65 (random forest) was achieved. Further, by limiting the selection to only the top 25 highly contributing operational taxonomic unit features, the area under the receiver operating characteristic curves was further significantly enhanced to ≈0.70. Overall, our study is the first to identify dysbiosis of gut microbiota in CVD patients as a group and apply this knowledge to develop a gut microbiome-based ML approach for diagnostic screening of CVD.

Authors

  • Sachin Aryal
    Center for Hypertension and Precision Medicine, Department of Physiology and Pharmacology, University of Toledo College of Medicine and Life Sciences, Toledo, OH, USA.
  • Ahmad Alimadadi
    Center for Hypertension and Precision Medicine, Program in Physiological Genomics, Department of Physiology and Pharmacology, University of Toledo College of Medicine and Life Sciences, Toledo, Ohio.
  • Ishan Manandhar
    Center for Hypertension and Precision Medicine, Department of Physiology and Pharmacology, University of Toledo College of Medicine and Life Sciences, Toledo, OH, USA.
  • Bina Joe
    Center for Hypertension and Precision Medicine, Department of Physiology and Pharmacology, University of Toledo College of Medicine and Life Sciences, Toledo, OH, USA.
  • Xi Cheng
    Genes, Cognition, and Psychosis Program, National Institute of Mental Health, National Institutes of HealthBethesda, MD, USA; The Lieber Institute for Brain DevelopmentBaltimore, MD, USA; Bioinformatics and Computational Biosciences Branch, Office of Cyber Infrastructure and Computational Biology (OCICB), National Institute of Allergy and Infectious Diseases (NIAID), National Institutes of HealthRockville, MD, USA.