TopDomain: Exhaustive Protein Domain Boundary Metaprediction Combining Multisource Information and Deep Learning.

Journal: Journal of chemical theory and computation
Published Date:

Abstract

Protein domains are independent, functional, and stable structural units of proteins. Accurate protein domain boundary prediction plays an important role in understanding protein structure and evolution, as well as for protein structure prediction. Current domain boundary prediction methods differ in terms of boundary definition, methodology, and training databases resulting in disparate performance for different proteins. We developed TopDomain, an exhaustive metapredictor, that uses deep neural networks to combine multisource information from sequence- and homology-based features of over 50 primary predictors. For this purpose, we developed a new domain boundary data set termed the TopDomain data set, in which the true annotations are informed by SCOPe annotations, structural domain parsers, human inspection, and deep learning. We benchmark TopDomain against 2484 targets with 3354 boundaries from the TopDomain test set and achieve F1 scores of 78.4% and 73.8% for multidomain boundary prediction within ±20 residues and ±10 residues of the true boundary, respectively. When examined on targets from CASP11-13 competitions, TopDomain achieves F1 scores of 47.5% and 42.8% for multidomain proteins. TopDomain significantly outperforms 15 widely used, state-of-the-art and homology-based domain boundary predictors. Finally, we implemented TopDomain, which accurately predicts whether domain parsing is necessary for the target protein.

Authors

  • Daniel Mulnaes
    Department of Mathematics and Natural Sciences , Institute for Pharmaceutical and Medicinal Chemistry, Heinrich Heine University Düsseldorf , Universitätsstrasse 1 , 40225 Düsseldorf , Germany.
  • Pegah Golchin
    Institut für Pharmazeutische und Medizinische Chemie, Heinrich-Heine-Universität Düsseldorf, Universitätsstr. 1, 40225 Düsseldorf, Germany.
  • Filip Koenig
    Institut für Pharmazeutische und Medizinische Chemie, Heinrich-Heine-Universität Düsseldorf, 40225 Düsseldorf, Germany.
  • Holger Gohlke
    Institute for Pharmaceutical and Medicinal Chemistry, Heinrich Heine University Düsseldorf 40225 Düsseldorf, Germany; Institute of Bio- and Geosciences (IBG4: Bioinformatics), Forschungszentrum Jülich GmbH, 52425 Jülich, Germany. Electronic address: gohlke@hhu.de.