SPOTONE: Hot Spots on Protein Complexes with Extremely Randomized Trees via Sequence-Only Features.

Journal: International journal of molecular sciences
Published Date:

Abstract

Protein Hot-Spots (HS) are experimentally determined amino acids, key to small ligand binding and tend to be structural landmarks on protein-protein interactions. As such, they were extensively approached by structure-based Machine Learning (ML) prediction methods. However, the availability of a much larger array of protein sequences in comparison to determined tree-dimensional structures indicates that a sequence-based HS predictor has the potential to be more useful for the scientific community. Herein, we present SPOTONE, a new ML predictor able to accurately classify protein HS via sequence-only features. This algorithm shows accuracy, AUROC, precision, recall and F1-score of 0.82, 0.83, 0.91, 0.82 and 0.85, respectively, on an independent testing set. The algorithm is deployed within a free-to-use webserver at http://moreiralab.com/resources/spotone, only requiring the user to submit a FASTA file with one or more protein sequences.

Authors

  • A J Preto
    CNC-Center for Neuroscience and Cell Biology, University of Coimbra, 3004-504 Coimbra, Portugal.
  • Irina S Moreira
    CNC-Center for Neuroscience and Cell Biology; Rua Larga, Faculdade de Medicina, Polo I, 1Âșandar, Universidade de Coimbra, 3004-504 Coimbra, Portugal. irina.moreira@cnc.uc.pt.