BACKGROUND: Basecalling, the computational process of translating raw electrical signal to nucleotide sequence, is of critical importance to the sequencing platforms produced by Oxford Nanopore Technologies (ONT). Here, we examine the performance of ...
The explosive growth of genomic data provides an opportunity to make increased use of sequence variations for phenotype prediction. We have developed a prediction machine for quantitative phenotypes (WhoGEM) that overcomes some of the bottlenecks lim...
Machine learning has demonstrated potential in analyzing large, complex biological data. In practice, however, biological information is required in addition to machine learning for successful application.
Predicting the effects of genetic variants on splicing is highly relevant for human genetics. We describe the framework MMSplice (modular modeling of splicing) with which we built the winning model of the CAGI5 exon skipping prediction challenge. The...
State-of-the-art methods assessing pathogenic non-coding variants have mostly been characterized on common disease-associated polymorphisms, yet with modest accuracy and strong positional biases. In this study, we curated 737 high-confidence pathogen...
The diversity of processed transcripts in eukaryotic genomes poses a challenge for the classification of their biological functions. Sparse sequence conservation in non-coding sequences and the unreliable nature of RNA structure predictions further e...
Transcriptional enhancers regulate spatio-temporal gene expression. While genomic assays can identify putative enhancers en masse, assigning target genes is a complex challenge. We devised a machine learning approach, McEnhancer, which links target g...
Using a powerful machine learning approach, a recent study of human genomes has revealed widespread footprints of recent positive selection on standing genetic variation.
SomaticSeq is an accurate somatic mutation detection pipeline implementing a stochastic boosting algorithm to produce highly accurate somatic mutation calls for both single nucleotide variants and small insertions and deletions. The workflow currentl...