Keeping up with the genomes: efficient learning of our increasing knowledge of the tree of life.

Journal: BMC bioinformatics
Published Date:

Abstract

BACKGROUND: It is a computational challenge for current metagenomic classifiers to keep up with the pace of training data generated from genome sequencing projects, such as the exponentially-growing NCBI RefSeq bacterial genome database. When new reference sequences are added to training data, statically trained classifiers must be rerun on all data, resulting in a highly inefficient process. The rich literature of "incremental learning" addresses the need to update an existing classifier to accommodate new data without sacrificing much accuracy compared to retraining the classifier with all data.

Authors

  • Zhengqiao Zhao
    Ecological and Evolutionary Signal-process and Informatics (EESI) Lab, Department of Electrical and Computer Engineering, Drexel University, Market Street, Philadelphia, US.
  • Alexandru Cristian
    Department of Computer Science, Drexel University, Market Street, Philadelphia, US.
  • Gail Rosen