A machine learning-based framework for modeling transcription elongation.

Journal: Proceedings of the National Academy of Sciences of the United States of America
PMID:

Abstract

RNA polymerase II (Pol II) generally pauses at certain positions along gene bodies, thereby interrupting the transcription elongation process, which is often coupled with various important biological functions, such as precursor mRNA splicing and gene expression regulation. Characterizing the transcriptional elongation dynamics can thus help us understand many essential biological processes in eukaryotic cells. However, experimentally measuring Pol II elongation rates is generally time and resource consuming. We developed PEPMAN (polymerase II elongation pausing modeling through attention-based deep neural network), a deep learning-based model that accurately predicts Pol II pausing sites based on the native elongating transcript sequencing (NET-seq) data. Through fully taking advantage of the attention mechanism, PEPMAN is able to decipher important sequence features underlying Pol II pausing. More importantly, we demonstrated that the analyses of the PEPMAN-predicted results around various types of alternative splicing sites can provide useful clues into understanding the cotranscriptional splicing events. In addition, associating the PEPMAN prediction results with different epigenetic features can help reveal important factors related to the transcription elongation process. All these results demonstrated that PEPMAN can provide a useful and effective tool for modeling transcription elongation and understanding the related biological factors from available high-throughput sequencing data.

Authors

  • Peiyuan Feng
    Institute for Interdisciplinary Information Sciences, Tsinghua University, Beijing, China.
  • An Xiao
    Institute for Interdisciplinary Information Sciences, Tsinghua University, Beijing, China.
  • Meng Fang
    Ministry of Education Key Laboratory of Bioinformatics, Tsinghua University, Beijing, China.
  • Fangping Wan
    Machine Biology Group, Departments of Psychiatry and Microbiology, Institute for Biomedical Informatics, Institute for Translational Medicine and Therapeutics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA.
  • Shuya Li
    School of Life Sciences, Tsinghua University, Beijing 100084, China.
  • Peng Lang
    Institute for Interdisciplinary Information Sciences, Tsinghua University, Beijing, China.
  • Dan Zhao
    Key Laboratory of Hunan Province for Water Environment and Agriculture Product Safety, College of Chemistry and Chemical Engineering, Central South University, Changsha, 410083, China.
  • Jianyang Zeng
    Institute for Interdisciplinary Information Sciences, Tsinghua University, Beijing 100084, China; MOE Key Laboratory of Bioinformatics, Tsinghua University, Beijing 100084, China. Electronic address: zengjy321@tsinghua.edu.cn.