Association rule mining of the human gut microbiome.

Journal: Science China. Life sciences
Published Date:

Abstract

The human gut carries a vast and diverse microbial community that is essential for human health. Understanding the structure of this complex community is a crucial step toward comprehending human-microbiome interactions. Traditional co-occurrence and correlation analyses typically focus on pairwise relationships and ignore higher-order relationships. Association rule mining (ARM) is a well-developed technique in data mining and has been applied to human microbiome data to identify higher-order relationships. Yet, existing attempts suffer from small sample sizes and low taxonomic resolution. We developed an advanced ARM framework and systematically investigated the interactions between microbial species using a public large-scale uniformly processed human microbiome data from the curatedMetagenomicData (CMD) together with ARM. First, we inferred association rules in the gut microbiome samples of healthy individuals (n=2,815) in CMD. Then we compared those rules with those inferred from the individuals with different diseases: inflammatory bowel disease (IBD, n=768), colorectal cancer (CRC, n=368), impaired glucose tolerance (IGT, n=199), and type 2 diabetes (T2D, n=164). Finally, we demonstrated that ARM is an efficient feature selection tool that can improve the performance of microbiome-based disease classification. Together, this study illustrates the higher-order microbial relationships in the human gut microbiome and highlights the critical importance of incorporating association rules in microbiome-based disease classification.

Authors

  • Yiyan Zhang
    Curriculum in Bioinformatics and Computational Biology, University of North Carolina at Chapel Hill, Chapel Hill, 27599, USA.
  • Shanlin Ke
    Channing Division of Network Medicine, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, 02115, USA.
  • Xu-Wen Wang
    Channing Division of Network Medicine, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA.
  • Yizhou Sun
    College of Computer and Information Science, Northeastern University, 360 Huntington Avenue, Boston, MA, USA.
  • Scott T Weiss
    From Research Information Systems and Computing (V.M.C., V.G., S.M.), Partners Healthcare; Boston Children's Hospital Informatics Program (D.D., S.F., G.S.); Harvard Medical School (D.D., S.Y., A.C., M.A.-E.-B., N.A.S., S.M., S.T.W., R.D.); Department of Medicine (S.Y., S.T.W.), Department of Neurosurgery (A.C., M.A.-E.-B., R.D.), Division of Rheumatology, Immunology and Allergy (N.A.S.), and Channing Division of Network Medicine (S.T.W., R.D.), Brigham and Women's Hospital, Boston, MA; Center for Statistical Science (S.Y.), Tsinghua University, Beijing, China; Department of Neurology (S.M.), Massachusetts General Hospital; and Biostatistics (T.C.), Harvard School of Public Health, Boston, MA.
  • Yang-Yu Liu
    Channing Division of Network Medicine, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA. yyl@channing.harvard.edu.

Keywords

No keywords available for this article.