High precision in protein contact prediction using fully convolutional neural networks and minimal sequence features.

Journal: Bioinformatics (Oxford, England)
Published Date:

Abstract

MOTIVATION: In addition to substitution frequency data from protein sequence alignments, many state-of-the-art methods for contact prediction rely on additional sources of information, or features, of protein sequences in order to predict residue-residue contacts, such as solvent accessibility, predicted secondary structure, and scores from other contact prediction methods. It is unclear how much of this information is needed to achieve state-of-the-art results. Here, we show that using deep neural network models, simple alignment statistics contain sufficient information to achieve state-of-the-art precision. Our prediction method, DeepCov, uses fully convolutional neural networks operating on amino-acid pair frequency or covariance data derived directly from sequence alignments, without using global statistical methods such as sparse inverse covariance or pseudolikelihood estimation.

Authors

  • David T Jones
    Department of Computer Science, Bioinformatics Group, University College London, Gower Street, London, WC1E 6BT, United Kingdom. d.t.jones@ucl.ac.uk.
  • Shaun M Kandathil
    Department of Computer Science, University College London, London, UK.