Transcription Factor Binding Site Prediction Using CnNet Approach.

Journal: IEEE/ACM transactions on computational biology and bioinformatics
PMID:

Abstract

Controlling the gene expression is the most important development in a living organism, which makes it easier to find different kinds of diseases and their causes. It's very difficult to know what factors control the gene expression. Transcription Factor (TF) is a protein that plays an important role in gene expression. Discovering the transcription factor has immense biological significance, however, it is challenging to develop novel techniques and evaluation for regulatory developments in biological structures. In this research, we mainly focus on 'sequence specificities' that can be ascertained from experimental data with 'deep learning' techniques, which offer a scalable, flexible and unified computational approach for predicting transcription factor binding. Specifically, Multiple Expression motifs for Motif Elicitation (MEME) technique with Convolution Neural Network (CNN) named as CnNet, has been used for discovering the 'sequence specificities' of DNA gene sequences dataset. This process involves two steps: a) discovering the motifs that are capable of identifying useful TF binding site by using MEME technique, and b) computing a score indicating the likelihood of a given sequence being a useful binding site by using CNN technique. The proposed CnNet approach predicts the TF binding score with much better accuracy compared to existing approaches.

Authors

  • M Mohamed Divan Masood
  • D Manjula
  • Vijayan Sugumaran
    Center for Data Science and Big Data Analytics, Oakland University, Rochester, MI, USA; Department of Decision and Information Sciences, School of Business Administration, Oakland University, Rochester, MI, USA. Electronic address: sugumara@oakland.edu.