A deep boosting based approach for capturing the sequence binding preferences of RNA-binding proteins from high-throughput CLIP-seq data.

Journal: Nucleic acids research
PMID:

Abstract

Characterizing the binding behaviors of RNA-binding proteins (RBPs) is important for understanding their functional roles in gene expression regulation. However, current high-throughput experimental methods for identifying RBP targets, such as CLIP-seq and RNAcompete, usually suffer from the false negative issue. Here, we develop a deep boosting based machine learning approach, called DeBooster, to accurately model the binding sequence preferences and identify the corresponding binding targets of RBPs from CLIP-seq data. Comprehensive validation tests have shown that DeBooster can outperform other state-of-the-art approaches in RBP target prediction. In addition, we have demonstrated that DeBooster may provide new insights into understanding the regulatory functions of RBPs, including the binding effects of the RNA helicase MOV10 on mRNA degradation, the potentially different ADAR1 binding behaviors related to its editing activity, as well as the antagonizing effect of RBP binding on miRNA repression. Moreover, DeBooster may provide an effective index to investigate the effect of pathogenic mutations in RBP binding sites, especially those related to splicing events. We expect that DeBooster will be widely applied to analyze large-scale CLIP-seq experimental data and can provide a practically useful tool for novel biological discoveries in understanding the regulatory mechanisms of RBPs. The source code of DeBooster can be downloaded from http://github.com/dongfanghong/deepboost.

Authors

  • Shuya Li
    School of Life Sciences, Tsinghua University, Beijing 100084, China.
  • Fanghong Dong
    Institute for Interdisciplinary Information Sciences, Tsinghua University, Beijing 100084, China.
  • Yuexin Wu
    Institute for Interdisciplinary Information Sciences, Tsinghua University, Beijing 100084, China.
  • Sai Zhang
    Institute for Interdisciplinary Information Sciences, Tsinghua University, Beijing, China.
  • Chen Zhang
    Department of Dermatology, Affiliated Jinling Hospital, Medical School of Nanjing University, Nanjing, China.
  • Xiao Liu
  • Tao Jiang
    Department of Respiratory and Critical Care Medicine, Center for Respiratory Medicine, the Fourth Affiliated Hospital of School of Medicine, and International School of Medicine, International Institutes of Medicine, Zhejiang University, Yiwu, China.
  • Jianyang Zeng
    Institute for Interdisciplinary Information Sciences, Tsinghua University, Beijing 100084, China; MOE Key Laboratory of Bioinformatics, Tsinghua University, Beijing 100084, China. Electronic address: zengjy321@tsinghua.edu.cn.