A Method for Predicting DNA Motif Length Based On Deep Learning.

Journal: IEEE/ACM transactions on computational biology and bioinformatics
Published Date:

Abstract

A DNA motif is a sequence pattern shared by the DNA sequence segments that bind to a specific protein. Discovering motifs in a given DNA sequence dataset plays a vital role in studying gene expression regulation. As an important attribute of the DNA motif, the motif length directly affects the quality of the discovered motifs. How to determine the motif length more accurately remains a difficult challenge to be solved. We propose a new motif length prediction scheme named MotifLen by using supervised machine learning. First, a method of constructing sample data for predicting the motif length is proposed. Secondly, a deep learning model for motif length prediction is constructed based on the convolutional neural network. Then, the methods of applying the proposed prediction model based on a motif found by an existing motif discovery algorithm are given. The experimental results show that i) the prediction accuracy of MotifLen is more than 90% on the validation set and is significantly higher than that of the compared methods on real datasets, ii) MotifLen can successfully optimize the motifs found by the existing motif discovery algorithms, and iii) it can effectively improve the time performance of some existing motif discovery algorithms.

Authors

  • Qiang Yu
    State Key Laboratory of Soil Erosion and Dryland Farming on the Loess Plateau, Northwest A&F University, Yangling 712100, China; College of Resources and Environment, University of Chinese Academy of Sciences, Beijing 100049, China. Electronic address: yuq@nwsuaf.edu.cn.
  • Xiao Zhang
    Merck & Co., Inc., Rahway, NJ, USA.
  • Yana Hu
  • Shengpin Chen
  • Liying Yang
    School of Computer Science and Technology, Xidian University, Xi'an, Shaanxi 710071, China.