MB-SupCon: Microbiome-based Predictive Models via Supervised Contrastive Learning.

Journal: Journal of molecular biology
PMID:

Abstract

Human microbiome consists of trillions of microorganisms. Microbiota can modulate the host physiology through molecule and metabolite interactions. Integrating microbiome and metabolomics data have the potential to predict different diseases more accurately. Yet, most datasets only measure microbiome data but without paired metabolome data. Here, we propose a novel integrative modeling framework, Microbiome-based Supervised Contrastive Learning Framework (MB-SupCon). MB-SupCon integrates microbiome and metabolome data to generate microbiome embeddings, which can be used to improve the prediction accuracy in datasets that only measure microbiome data. As a proof of concept, we applied MB-SupCon on 720 samples with paired 16S microbiome data and metabolomics data from patients with type 2 diabetes. MB-SupCon outperformed existing prediction methods and achieved high average prediction accuracies for insulin resistance status (84.62%), sex (78.98%), and race (80.04%). Moreover, the microbiome embeddings form separable clusters for different covariate groups in the lower-dimensional space, which enhances data visualization. We also applied MB-SupCon on a large inflammatory bowel disease study and observed similar advantages. Thus, MB-SupCon could be broadly applicable to improve microbiome prediction models in multi-omics disease studies.

Authors

  • Sen Yang
    Key Laboratory of Symbol Computation and Knowledge Engineering of Ministry of Education, College of Computer Science and Technology, Jilin University, Changchun, 130012, China.
  • Shidan Wang
    Quantitative Biomedical Research Center, Department of Clinical Sciences, University of Texas Southwestern Medical Center, 5325 Harry Hines Blvd, Dallas, TX, 75390, USA.
  • Yiqing Wang
    Department of Statistical Science, Southern Methodist University, Dallas, TX 75275, United States. Electronic address: lucy@mail.smu.edu.
  • Ruichen Rong
    Quantitative Biomedical Research Center, Department of Population and Data Sciences, University of Texas Southwestern Medical Center, Dallas, Texas.
  • Jiwoong Kim
    Quantitative Biomedical Research Center, Department of Population and Data Sciences, University of Texas Southwestern Medical Center, Dallas, Texas, United States of America.
  • Bo Li
    Electric Power Research Institute, Yunnan Power Grid Co., Ltd., Kunming, Yunnan, China.
  • Andrew Y Koh
    Harold C. Simmons Cancer Center, University of Texas Southwestern Medical Center, Dallas, TX 75390, United States; Department of Microbiology, University of Texas Southwestern Medical Center, Dallas, TX 75390, United States; Department of Paediatrics, University of Texas Southwestern Medical Center, Dallas, TX 75390, United States. Electronic address: Andrew.Koh@UTSouthwestern.edu.
  • Guanghua Xiao
  • Qiwei Li
    Department of General Surgery, South Campus, Renji Hospital, School of Medicine, Shanghai Jiaotong University, Shanghai, China.
  • Dajiang J Liu
    Pennsylvania State University, Department of Public Health Sciences, 700 HMC Crescent Road, Hershey, PA 17033, USA.
  • Xiaowei Zhan
    Quantitative Biomedical Research Center, Department of Population and Data Sciences, University of Texas Southwestern Medical Center, Dallas, Texas. Electronic address: xiaowei.zhan@utsouthwestern.edu.