Systematic evaluation of supervised machine learning for sample origin prediction using metagenomic sequencing data.

Journal: Biology direct
Published Date:

Abstract

BACKGROUND: The advent of metagenomic sequencing provides microbial abundance patterns that can be leveraged for sample origin prediction. Supervised machine learning classification approaches have been reported to predict sample origin accurately when the origin has been previously sampled. Using metagenomic datasets provided by the 2019 CAMDA challenge, we evaluated the influence of variable technical, analytical and machine learning approaches for result interpretation and novel source prediction.

Authors

  • Julie Chih-Yu Chen
    National Microbiology Laboratory, Public Health Agency of Canada, 1015 Arlington Street, Winnipeg, Manitoba, R3E 3R2, Canada. chih-yu.chen@canada.ca.
  • Andrea D Tyler
    National Microbiology Laboratory, Public Health Agency of Canada, 1015 Arlington Street, Winnipeg, Manitoba, R3E 3R2, Canada.