Afann: bias adjustment for alignment-free sequence comparison based on sequencing data using neural network regression.

Journal: Genome biology
Published Date:

Abstract

Alignment-free methods, more time and memory efficient than alignment-based methods, have been widely used for comparing genome sequences or raw sequencing samples without assembly. However, in this study, we show that alignment-free dissimilarity calculated based on sequencing samples can be overestimated compared with the dissimilarity calculated based on their genomes, and this bias can significantly decrease the performance of the alignment-free analysis. Here, we introduce a new alignment-free tool, Alignment-Free methods Adjusted by Neural Network (Afann) that successfully adjusts this bias and achieves excellent performance on various independent datasets. Afann is freely available at https://github.com/GeniusTang/Afann.

Authors

  • Kujin Tang
    Quantitative and Computational Biology Program, Department of Biological Sciences, University of Southern California, Los Angeles, CA, USA.
  • Jie Ren
    Digital Clinical Measures, Translational Medicine, Merck & Co., Inc., Rahway, NJ, United States.
  • Fengzhu Sun
    Molecular and Computational Biology Program, University of Southern California, Los Angeles, California, USA. fsun@usc.edu.