Probabilistic and machine-learning methods for predicting local rates of transcription elongation from nascent RNA sequencing data.

Journal: Nucleic acids research

PMID: 39964478

Abstract

Rates of transcription elongation vary within and across eukaryotic gene bodies. Here, we introduce new methods for predicting elongation rates from nascent RNA sequencing data. First, we devise a probabilistic model that predicts nucleotide-specific elongation rates as a generalized linear function of nearby genomic and epigenomic features. We validate this model with simulations and apply it to public PRO-seq (Precision Run-On Sequencing) and epigenomic data for four cell types, finding that reductions in local elongation rate are associated with cytosine nucleotides, DNA methylation, splice sites, RNA stem-loops, CTCF (CCCTC-binding factor) binding sites, and several histone marks, including H3K36me3 and H4K20me1. By contrast, increases in local elongation rate are associated with thymines, A+T-rich and low-complexity sequences, and H3K79me2 marks. We then introduce a convolutional neural network that improves our local rate predictions. Our analysis is the first to permit genome-wide predictions of relative nucleotide-specific elongation rates.

Authors

Lingjie Liu

Huawei Technologies Co, Shenzhen, China.
Yixin Zhao

Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY 11724, United States.
Rebecca Hassett

Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY 11724, United States.
Shushan Toneyan

Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, 1 Bungtown Road, Cold Spring Harbor, NY, USA.
Peter K Koo

Howard Hughes Medical Institute, Department of Molecular and Cellular Biology, Harvard University, Cambridge, MA 02138, United States.
Adam Siepel

Department of Biological Statistics and Computational Biology, Cornell University, Ithaca, New York, USA.

Keywords

Binding Sites CCCTC-Binding Factor DNA Methylation Epigenomics Histone Code Histones Humans Machine Learning Models, Statistical Neural Networks, Computer RNA Splice Sites Sequence Analysis, RNA Transcription Elongation, Genetic

External Resources

View on PubMed Access via DOI PubMed (39964478)

Probabilistic and machine-learning methods for predicting local rates of transcription elongation from nascent RNA sequencing data.

Abstract

Authors

Keywords

External Resources

Popular Topics

Recent Journals