Large-scale machine learning for metagenomics sequence classification.

Journal: Bioinformatics (Oxford, England)

Published Date: Nov 20, 2015

Abstract

MOTIVATION: Metagenomics characterizes the taxonomic diversity of microbial communities by sequencing DNA directly from an environmental sample. One of the main challenges in metagenomics data analysis is the binning step, where each sequenced read is assigned to a taxonomic clade. Because of the large volume of metagenomics datasets, binning methods need fast and accurate algorithms that can operate with reasonable computing requirements. While standard alignment-based methods provide state-of-the-art performance, compositional approaches that assign a taxonomic class to a DNA read based on the k-mers it contains have the potential to provide faster solutions.

Authors

Kévin Vervier

Bioinformatics Research Departement, bioMérieux, 69280 Marcy-l'Étoile, MINES ParisTech, PSL Research University, CBIO-Centre for Computational Biology, 77300 Fontainebleau, Institut Curie, 75248 Paris Cedex and INSERM U900, 75248 Paris Cedex, France.
Pierre Mahé

Bioinformatics Research Departement, bioMérieux, 69280 Marcy-l'Étoile.
Maud Tournoud

Bioinformatics Research Departement, bioMérieux, 69280 Marcy-l'Étoile.
Jean-Baptiste Veyrieras

Bioinformatics Research Departement, bioMérieux, 69280 Marcy-l'Étoile.
Jean-Philippe Vert

MINES ParisTech, PSL Research University, CBIO-Centre for Computational Biology, 77300 Fontainebleau, Institut Curie, 75248 Paris Cedex and INSERM U900, 75248 Paris Cedex, France.

Keywords

Algorithms Machine Learning Metagenome Metagenomics Sequence Analysis, DNA Software

External Resources

View on PubMed Access via DOI PubMed (26589281)

Large-scale machine learning for metagenomics sequence classification.

Abstract

Authors

Keywords

External Resources

Popular Topics

Recent Journals