BIPSPI+: Mining Type-Specific Datasets of Protein Complexes to Improve Protein Binding Site Prediction.

Journal: Journal of molecular biology
PMID:

Abstract

Computational approaches for predicting protein-protein interfaces are extremely useful for understanding and modelling the quaternary structure of protein assemblies. In particular, partner-specific binding site prediction methods allow delineating the specific residues that compose the interface of protein complexes. In recent years, new machine learning and other algorithmic approaches have been proposed to solve this problem. However, little effort has been made in finding better training datasets to improve the performance of these methods. With the aim of vindicating the importance of the training set compilation procedure, in this work we present BIPSPI+, a new version of our original server trained on carefully curated datasets that outperforms our original predictor. We show how prediction performance can be improved by selecting specific datasets that better describe particular types of protein interactions and interfaces (e.g. homo/hetero). In addition, our upgraded web server offers a new set of functionalities such as the sequence-structure prediction mode, hetero- or homo-complex specialization and the guided docking tool that allows to compute 3D quaternary structure poses using the predicted interfaces. BIPSPI+ is freely available at https://bipspi.cnb.csic.es.

Authors

  • R Sanchez-Garcia
    Biocomputing Unit, National Center for Biotechnology (CSIC), Darwin 3, Campus Univ. Autónoma de Madrid, Cantoblanco, 28049 Madrid, Spain; Oxford Protein Informatics Group, Department of Statistics, University of Oxford, 29 St Giles' Oxford OX1 3LB, UK. Electronic address: ruben.sanchez-garcia@stats.ox.ac.uk.
  • J R Macias
    Biocomputing Unit, National Center for Biotechnology (CSIC), Darwin 3, Campus Univ. Autónoma de Madrid, Cantoblanco, 28049 Madrid, Spain.
  • C O S Sorzano
    GN7 of the Spanish National Institute for Bioinformatics (INB), Biocomputing Unit, National Center of Biotechnology (CSIC), Instruct Image Processing Center, Madrid, Spain.
  • J M Carazo
    GN7 of the Spanish National Institute for Bioinformatics (INB), Biocomputing Unit, National Center of Biotechnology (CSIC), Instruct Image Processing Center, Madrid, Spain.
  • J Segura
    Bioanalysis Research Group, IMIM, Hospital del Mar Medical Research Institute, Doctor Aiguader 88, 08003 Barcelona, Spain; Department of Experimental and Health Sciencies, Universitat Pompeu Fabra, Doctor Aiguader 88, 08003 Barcelona, Spain.