FastSK: fast sequence analysis with gapped string kernels.
Journal:
Bioinformatics (Oxford, England)
Published Date:
Dec 30, 2020
Abstract
MOTIVATION: Gapped k-mer kernels with support vector machines (gkm-SVMs) have achieved strong predictive performance on regulatory DNA sequences on modestly sized training sets. However, existing gkm-SVM algorithms suffer from slow kernel computation time, as they depend exponentially on the sub-sequence feature length, number of mismatch positions, and the task's alphabet size.