Evaluation of deep learning approaches for modeling transcription factor sequence specificity.

Journal: Genomics
Published Date:

Abstract

As a key component of gene regulation, transcription factors (TFs) play an important role in a number of biological processes. To fully understand the underlying mechanism of TF-mediated gene regulation, it is therefore critical to accurately identify TF binding sites and predict their affinities. Recently, deep learning (DL) algorithms have achieved promising results in the prediction of DNA-TF binding, however, various deep learning architectures have not been systematically compared, and the relative merit of each architecture remains unclear. To address this problem, we applied four different deep learning architectures to SELEX-seq and HT-SELEX data, covering three species and 35 families. We evaluated and compared the performance of different deep neural models using 10-fold cross-validation. Our results indicate that the hybrid CNN + DNN model shows the best performances. We expect that our study will be broadly applicable to modeling and predicting TF binding specificity when more high-throughput affinity data are available.

Authors

  • Yonglin Zhang
    State Key Laboratory of Urban and Regional Ecology, Research Center for Eco-Environmental Sciences, Chinese Academy of Sciences, Beijing, China.
  • Qi Mo
    Department of Pharmacology, School of Pharmacy, Southwest Medical University, Luzhou 646000, China.
  • Li Xue
    HDU-ITMO Joint Institute, Hangzhou Dianzi University, Hangzhou 310018, China.
  • Jiesi Luo
    College of Chemistry, Sichuan University, Chengdu 610064, PR China.