ProNA2020 predicts protein-DNA, protein-RNA, and protein-protein binding proteins and residues from sequence.

Journal: Journal of molecular biology

Published Date: Mar 27, 2020

Abstract

The intricate details of how proteins bind to proteins, DNA, and RNA are crucial for the understanding of almost all biological processes. Disease-causing sequence variants often affect binding residues. Here, we described a new, comprehensive system of in silico methods that take only protein sequence as input to predict binding of protein to DNA, RNA, and other proteins. Firstly, we needed to develop several new methods to predict whether or not proteins bind (per-protein prediction). Secondly, we developed independent methods that predict which residues bind (per-residue). Not requiring three-dimensional information, the system can predict the actual binding residue. The system combined homology-based inference with machine learning and motif-based profile-kernel approaches with word-based (ProtVec) solutions to machine learning protein level predictions. This achieved an overall non-exclusive three-state accuracy of 77% ± 1% (±one standard error) corresponding to a 1.8 fold improvement over random (best classification for protein-protein with F1 = 91 ± 0.8%). Standard neural networks for per-residue binding residue predictions appeared best for DNA-binding (Q2 = 81 ± 0.9%) followed by RNA-binding (Q2 = 80 ± 1%) and worst for protein-protein binding (Q2 = 69 ± 0.8%). The new method, dubbed ProNA2020, is available as code through github (https://github.com/Rostlab/ProNA2020.git) and through PredictProtein (www.predictprotein.org).

Authors

Jiajun Qiu

Department of Informatics, I12-Chair of Bioinformatics and Computational Biology, Technical University of Munich (TUM), Boltzmannstrasse 3, 85748, Garching, Munich, Germany; TUM Graduate School, Center of Doctoral Studies in Informatics and Its Applications (CeDoSIA), Garching, 85748, Germany. Electronic address: jiajunqiu@hotmail.com.
Michael Bernhofer

Department of Informatics & Center for Bioinformatics & Computational Biology - i12, Technische Universität München (TUM), Boltzmannstr. 3, Garching/Munich, 85748, Germany. Michael.Bernhofer@mytum.de.
Michael Heinzinger

Department of Informatics, Bioinformatics & Computational Biology - i12, TUM (Technical University of Munich), Boltzmannstr. 3, 85748, Garching/Munich, Germany. mheinzinger@rostlab.org.
Sofie Kemper

Department of Informatics, I12-Chair of Bioinformatics and Computational Biology, Technical University of Munich (TUM), Boltzmannstrasse 3, 85748, Garching, Munich, Germany.
Tomas Norambuena

Molecular Bioinformatics Laboratory, Facultad de Ciencias Biológicas, Pontificia Universidad Católica de Chile, Santiago, Chile.
Francisco Melo

Molecular Bioinformatics Laboratory, Facultad de Ciencias Biológicas, Pontificia Universidad Católica de Chile, Santiago, Chile; Institute of Biological and Medical Engineering, Pontificia Universidad Católica de Chile, Santiago, Chile.
Burkhard Rost

Keywords

Animals Binding Sites Computational Biology DNA Eukaryota Humans Machine Learning Neural Networks, Computer Nucleic Acid Conformation Prokaryotic Cells Protein Binding Protein Conformation Proteins RNA Sequence Analysis, Protein Software

External Resources

View on PubMed Access via DOI PubMed (32142788)

ProNA2020 predicts protein-DNA, protein-RNA, and protein-protein binding proteins and residues from sequence.

Abstract

Authors

Keywords

External Resources

Popular Topics

Recent Journals