Weakly-Supervised Convolutional Neural Network Architecture for Predicting Protein-DNA Binding.

Journal: IEEE/ACM transactions on computational biology and bioinformatics

Published Date: Aug 7, 2018

Abstract

Although convolutional neural networks (CNN) have outperformed conventional methods in predicting the sequence specificities of protein-DNA binding in recent years, they do not take full advantage of the intrinsic weakly-supervised information of DNA sequences that a bound sequence may contain multiple TFBS(s). Here, we propose a weakly-supervised convolutional neural network architecture (WSCNN), combining multiple-instance learning (MIL) with CNN, to further boost the performance of predicting protein-DNA binding. WSCNN first divides each DNA sequence into multiple overlapping subsequences (instances) with a sliding window, and then separately models each instance using CNN, and finally fuses the predicted scores of all instances in the same bag using four fusion methods, including Max, Average, Linear Regression, and Top-Bottom Instances. The experimental results on in vivo and in vitro datasets illustrate the performance of the proposed approach. Moreover, models built on in vitro data using WSCNN can predict in vivo protein-DNA binding with good accuracy. In addition, we give a quantitative analysis of the importance of the reverse-complement mode in predicting in vivo protein-DNA binding, and explain why not directly use advanced pooling layers to combine MIL with CNN, through a series of experiments.

Authors

Qinhu Zhang
Lin Zhu

Institute of Environmental Technology, College of Environmental and Resource Sciences; Zhejiang University, Hangzhou 310058, China.
Wenzheng Bao

School of Information Engineering, Xuzhou University of Technology, Xuzhou, China.
De-Shuang Huang

Keywords

Algorithms Animals Binding Sites Computational Biology DNA DNA-Binding Proteins Mice Neural Networks, Computer Protein Binding Supervised Machine Learning Transcription Factors

External Resources

View on PubMed Access via DOI PubMed (30106688)

Weakly-Supervised Convolutional Neural Network Architecture for Predicting Protein-DNA Binding.

Abstract

Authors

Keywords

External Resources

Popular Topics

Recent Journals