deepTFBS: Improving within- and Cross-Species Prediction of Transcription Factor Binding Using Deep Multi-Task and Transfer Learning.

Journal: Advanced science (Weinheim, Baden-Wurttemberg, Germany)
Published Date:

Abstract

The precise prediction of transcription factor binding sites (TFBSs) is crucial in understanding gene regulation. In this study, deepTFBS, a comprehensive deep learning (DL) framework that builds a robust DNA language model of TF binding grammar for accurately predicting TFBSs within and across plant species is presented. Taking advantages of multi-task DL and transfer learning, deepTFBS is capable of leveraging the knowledge learned from large-scale TF binding profiles to enhance the prediction of TFBSs under small-sample training and cross-species prediction tasks. When tested using available information on 359 Arabidopsis TFs, deepTFBS outperformed previously described prediction strategies, including position weight matrix, deepSEA and DanQ, with a 244.49%, 49.15%, and 23.32% improvement of the area under the precision-recall curve (PRAUC), respectively. Further cross-species prediction of TFBS in wheat showed that deepTFBS yielded a significant PRAUC improvement of 30.6% over these three baseline models. deepTFBS can also utilize information from gene conservation and binding motifs, enabling efficient TFBS prediction in species where experimental data availability is limited. A case study, focusing on the WUSCHEL (WUS) transcription factor, illustrated the potential use of deepTFBS in cross-species applications, in our example between Arabidopsis and wheat. deepTFBS is publically available at https://github.com/cma2015/deepTFBS.

Authors

  • Jingjing Zhai
    State Key Laboratory of Crop Stress Biology for Arid Areas, Center of Bioinformatics, College of Life Sciences, Northwest A&F University, Yangling, 712100, Shaanxi, China.
  • Yuzhou Zhang
    State Key Laboratory for Crop Stress Resistance and High-Efficiency Production, Center of Bioinformatics, College of Life Sciences, Northwest A&F University, Yangling, Shaanxi, 712100, China.
  • Chujun Zhang
    State Key Laboratory of Crop Stress Resistance and High-Efficiency Production, College of Life Sciences, Northwest A&F University, Shaanxi, Yangling 712100, China.
  • Xiaotong Yin
    College of Life Sciences, Northwest A&F University, Yangling, Shaanxi, 712100, China.
  • Minggui Song
    State Key Laboratory of Crop Stress Resistance and High-Efficiency Production, College of Life Sciences, Northwest A&F University, Shaanxi, Yangling 712100, China.
  • Chenglong Tang
    College of Life Sciences, Northwest A&F University, Yangling, Shaanxi, 712100, China.
  • Pengjun Ding
    State Key Laboratory for Crop Stress Resistance and High-Efficiency Production, Center of Bioinformatics, College of Life Sciences, Northwest A&F University, Yangling, Shaanxi, 712100, China.
  • Zenglin Li
    State Key Laboratory for Crop Stress Resistance and High-Efficiency Production, Center of Bioinformatics, College of Life Sciences, Northwest A&F University, Yangling, Shaanxi, 712100, China.
  • Chuang Ma
    State Key Laboratory of Crop Stress Biology for Arid Areas, Center of Bioinformatics, College of Life Sciences, Northwest A&F University, Yangling, 712100, Shaanxi, China. cma@nwafu.edu.cn.

Keywords

No keywords available for this article.