FastSK: fast sequence analysis with gapped string kernels.

Journal: Bioinformatics (Oxford, England)
Published Date:

Abstract

MOTIVATION: Gapped k-mer kernels with support vector machines (gkm-SVMs) have achieved strong predictive performance on regulatory DNA sequences on modestly sized training sets. However, existing gkm-SVM algorithms suffer from slow kernel computation time, as they depend exponentially on the sub-sequence feature length, number of mismatch positions, and the task's alphabet size.

Authors

  • Derrick Blakely
    Department of Computer Science, University of Virginia, Charlottesville, VA, USA.
  • Eamon Collins
    Department of Computer Science, University of Virginia, Charlottesville, VA, USA.
  • Ritambhara Singh
  • Andrew Norton
    Department of Computer Science, University of Virginia, Charlottesville, VA, USA.
  • Jack Lanchantin
  • Yanjun Qi