Identifying complex motifs in massive omics data with a variable-convolutional layer in deep neural network.

Journal: Briefings in bioinformatics
Published Date:

Abstract

Motif identification is among the most common and essential computational tasks for bioinformatics and genomics. Here we proposed a novel convolutional layer for deep neural network, named variable convolutional (vConv) layer, for effective motif identification in high-throughput omics data by learning kernel length from data adaptively. Empirical evaluations on DNA-protein binding and DNase footprinting cases well demonstrated that vConv-based networks have superior performance to their convolutional counterparts regardless of model complexity. Meanwhile, vConv could be readily integrated into multi-layer neural networks as an 'in-place replacement' of canonical convolutional layer. All source codes are freely available on GitHub for academic usage.

Authors

  • Jing-Yi Li
    Department of Cariology and Endodontology, Peking University School and Hospital of Stomatology, Beijing, China.
  • Shen Jin
    Computational Biology Department, Carnegie Mellon University, Pittsburgh, PA 15213, USA.
  • Xin-Ming Tu
    Biomedical Pioneering Innovation Center & Beijing Advanced Innovation Center for Genomics, Center for Bioinformatics, and State Key Laboratory of Protein and Plant Gene Research at School of Life Sciences, Peking University, Beijing 100871, China.
  • Yang Ding
    Department of Pediatrics, Sainte-Justine University Hospital and University of Montreal, Montreal, Quebec, Canada.
  • Ge Gao
    School of Computer Science and Artificial Intelligence, Aliyun School of Big Data, School of Software, Changzhou University, Changzhou 213000, China.