Prediction of regulatory motifs from human Chip-sequencing data using a deep learning framework.

Journal: Nucleic acids research
Published Date:

Abstract

The identification of transcription factor binding sites and cis-regulatory motifs is a frontier whereupon the rules governing protein-DNA binding are being revealed. Here, we developed a new method (DEep Sequence and Shape mOtif or DESSO) for cis-regulatory motif prediction using deep neural networks and the binomial distribution model. DESSO outperformed existing tools, including DeepBind, in predicting motifs in 690 human ENCODE ChIP-sequencing datasets. Furthermore, the deep-learning framework of DESSO expanded motif discovery beyond the state-of-the-art by allowing the identification of known and new protein-protein-DNA tethering interactions in human transcription factors (TFs). Specifically, 61 putative tethering interactions were identified among the 100 TFs expressed in the K562 cell line. In this work, the power of DESSO was further expanded by integrating the detection of DNA shape features. We found that shape information has strong predictive power for TF-DNA binding and provides new putative shape motif information for human TFs. Thus, DESSO improves in the identification and structural analysis of TF binding sites, by integrating the complexities of DNA binding into a deep-learning framework.

Authors

  • Jinyu Yang
    School of Computer and Software Engineering, Xihua University, Chengdu 610039, China.
  • Anjun Ma
    Department of Biomedical Informatics, College of Medicine, The Ohio State University, Columbus, OH 43210, USA.
  • Adam D Hoppe
    Department of Chemistry and Biochemistry, South Dakota State University, Brookings, SD 57007, USA.
  • Cankun Wang
    Bioinformatics and Mathematical Biosciences Lab, Department of Agronomy, Horticulture, and Plant Science, Department of Mathematics and Statistics, South Dakota State University, Brookings, SD, 57006, USA.
  • Yang Li
    Occupation of Chinese Center for Disease Control and Prevention, Beijing, China.
  • Chi Zhang
    Department of Thoracic Surgery, Union Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, China.
  • Yan Wang
    College of Animal Science and Technology, Beijing University of Agriculture, Beijing, China.
  • Bingqiang Liu
  • Qin Ma
    Computational Systems Biology Lab, Department of Biochemistry and Molecular Biology, and Institute of Bioinformatics, University of Georgia, GA 30602, USA BioEnergy Science Center, TN 37831, USA.