Microbiome meta-analysis and cross-disease comparison enabled by the SIAMCAT machine learning toolbox.

Journal: Genome biology
PMID:

Abstract

The human microbiome is increasingly mined for diagnostic and therapeutic biomarkers using machine learning (ML). However, metagenomics-specific software is scarce, and overoptimistic evaluation and limited cross-study generalization are prevailing issues. To address these, we developed SIAMCAT, a versatile R toolbox for ML-based comparative metagenomics. We demonstrate its capabilities in a meta-analysis of fecal metagenomic studies (10,803 samples). When naively transferred across studies, ML models lost accuracy and disease specificity, which could however be resolved by a novel training set augmentation strategy. This reveals some biomarkers to be disease-specific, with others shared across multiple conditions. SIAMCAT is freely available from siamcat.embl.de .

Authors

  • Jakob Wirbel
    Structural and Computational Biology Unit, European Molecular Biology Laboratory (EMBL), 69117, Heidelberg, Germany.
  • Konrad Zych
    Structural and Computational Biology Unit, European Molecular Biology Laboratory (EMBL), 69117, Heidelberg, Germany.
  • Morgan Essex
    Structural and Computational Biology Unit, European Molecular Biology Laboratory (EMBL), 69117, Heidelberg, Germany.
  • Nicolai Karcher
    Structural and Computational Biology Unit, European Molecular Biology Laboratory (EMBL), 69117, Heidelberg, Germany.
  • Ece Kartal
    Structural and Computational Biology Unit, European Molecular Biology Laboratory (EMBL), 69117, Heidelberg, Germany.
  • Guillem Salazar
    Department of Biology, Institute of Microbiology and Swiss Institute of Bioinformatics, ETH Zürich, 8093, Zürich, Switzerland.
  • Peer Bork
    Structural and Computational Biology Unit, European Molecular Biology Laboratory (EMBL), 69117, Heidelberg, Germany.
  • Shinichi Sunagawa
    Department of Biology, Institute of Microbiology and Swiss Institute of Bioinformatics, ETH Zürich, 8093, Zürich, Switzerland.
  • Georg Zeller
    Structural and Computational Biology Unit, European Molecular Biology Laboratory (EMBL), 69117, Heidelberg, Germany. zeller@embl.de.