Nanopore sequencing generates noisy electrical signals that need to be converted into a standard string of DNA nucleotide bases using a computational step called basecalling. The performance of basecalling has critical implications for all later step...
Protein function annotation has been one of the longstanding issues in biological sciences, and various computational methods have been developed. However, the existing methods suffer from a serious long-tail problem, with a large number of GO famili...
Expansions of tandem repeats (TRs) cause approximately 60 monogenic diseases. We expect that the discovery of additional pathogenic repeat expansions will narrow the diagnostic gap in many diseases. A growing number of TR expansions are being identif...
Existing RNA velocity estimation methods strongly rely on predefined dynamics and cell-agnostic constant transcriptional kinetic rates, assumptions often violated in complex and heterogeneous single-cell RNA sequencing (scRNA-seq) data. Using a graph...
Many deep learning-based methods have been proposed to handle complex single-cell data. Deep learning approaches may also prove useful to jointly analyze single-cell RNA sequencing (scRNA-seq) and single-cell T cell receptor sequencing (scTCR-seq) da...
Linking cis-regulatory sequences to target genes has been a long-standing challenge. In this study, we introduce CREaTor, an attention-based deep neural network designed to model cis-regulatory patterns for genomic elements up to 2 Mb from target gen...
Differential analysis of bulk RNA-seq data often suffers from lack of good controls. Here, we present a generative model that replaces controls, trained solely on healthy tissues. The unsupervised model learns a low-dimensional representation and can...
BACKGROUND: Feature selection is an essential task in single-cell RNA-seq (scRNA-seq) data analysis and can be critical for gene dimension reduction and downstream analyses, such as gene marker identification and cell type classification. Most popula...
When analyzing data from in situ RNA detection technologies, cell segmentation is an essential step in identifying cell boundaries, assigning RNA reads to cells, and studying the gene expression and morphological features of cells. We developed a dee...
Malaria remains one of the deadliest infectious diseases. Transcriptional regulation effects of noncoding variants in this unusual genome of malaria parasites remain elusive. We developed a sequence-based, ab initio deep learning framework, MalariaSE...