Identification of city specific important bacterial signature for the MetaSUB CAMDA challenge microbiome data.
Journal:
Biology direct
Published Date:
Jul 24, 2019
Abstract
BACKGROUND: Metagenomic data of whole genome sequences (WGS) from samples across several cities around the globe may unravel city specific signatures of microbes. Illumina MiSeq sequencing data was provided from 12 cities in 7 different countries as part of the 2018 CAMDA "MetaSUB Forensic Challenge", including also samples from three mystery sets. We used appropriate machine learning techniques on this massive dataset to effectively identify the geographical provenance of "mystery" samples. Additionally, we pursued compositional data analysis to develop accurate inferential techniques for such microbiome data. It is expected that this current data, which is of higher quality and higher sequence depth compared to the CAMDA 2017 MetaSUB challenge data, along with improved analytical techniques would yield many more interesting, robust and useful results that can be beneficial for forensic analysis.