High-Order Convolutional Neural Network Architecture for Predicting DNA-Protein Binding Sites.

Journal: IEEE/ACM transactions on computational biology and bioinformatics
Published Date:

Abstract

Although Deep learning algorithms have outperformed conventional methods in predicting the sequence specificities of DNA-protein binding, they lack to consider the dependencies among nucleotides and the diverse binding lengths for different transcription factors (TFs). To address the above two limitations simultaneously, in this paper, we propose a high-order convolutional neural network architecture (HOCNN), which employs a high-order encoding method to build high-order dependencies among nucleotides, and a multi-scale convolutional layer to capture the motif features of different length. The experimental results on real ChIP-seq datasets show that the proposed method outperforms the state-of-the-art deep learning method (DeepBind) in the motif discovery task. In addition, we provide further insights about the importance of introducing additional convolutional kernels and the degeneration problem of importing high-order in the motif discovery task.

Authors

  • Qinhu Zhang
  • Lin Zhu
    Institute of Environmental Technology, College of Environmental and Resource Sciences; Zhejiang University, Hangzhou 310058, China.
  • De-Shuang Huang