Biosynthetic gene clusters (BGCs), key in synthesizing microbial secondary metabolites, are mostly hidden in microbial genomes and metagenomes. To unearth this vast potential, we present BGC-Prophet, a transformer-based language model for BGC predict...
Accurate annotation of coding regions in RNAs is essential for understanding gene translation. We developed a deep neural network to directly predict and analyze translation initiation and termination sites from RNA sequences. Trained with human tran...
Ribonucleic Acid (RNA) is the central conduit for information transfer in the cell. Identifying potential RNA targets in disease conditions is a challenging task, given the vast repertoire of functional non-coding RNAs in a human cell. A potential dr...
Species-specific differences in protein translation can affect the design of protein-based drugs. Consequently, efficient expression of recombinant proteins often requires codon optimization. Publicly available optimization tools do not always result...
Spatial transcriptomics technology has revolutionized our understanding of cellular systems by capturing RNA transcript levels in their original spatial context. Single-cell spatial transcriptomics (scST) offers single-cell resolution expression leve...
Gene regulatory networks (GRNs) provide a global representation of how genetic/genomic information is transferred in living systems and are a key component in understanding genome regulation. Single-cell multiome data provide unprecedented opportunit...
In infected individuals, viruses are present as a population consisting of dominant and minor variant genomes. Most databases contain information on the dominant genome sequence. Since the emergence of SARS-CoV-2 in late 2019, variants have been sele...
Rates of transcription elongation vary within and across eukaryotic gene bodies. Here, we introduce new methods for predicting elongation rates from nascent RNA sequencing data. First, we devise a probabilistic model that predicts nucleotide-specific...
To understand the complex relationship between histone mark activity and gene expression, recent advances have used in silico predictions based on large-scale machine learning models. However, these approaches have omitted key contributing factors li...
Machine learning (ML) has shown great potential in the adaptive immune receptor repertoire (AIRR) field. However, there is a lack of large-scale ground-truth experimental AIRR data suitable for AIRR-ML-based disease diagnostics and therapeutics disco...