Transformer Neural Networks for Protein Family and Interaction Prediction Tasks.

Journal: Journal of computational biology : a journal of computational molecular cell biology
Published Date:

Abstract

The scientific community is rapidly generating protein sequence information, but only a fraction of these proteins can be experimentally characterized. While promising deep learning approaches for protein prediction tasks have emerged, they have computational limitations or are designed to solve a specific task. We present a Transformer neural network that pre-trains task-agnostic sequence representations. This model is fine-tuned to solve two different protein prediction tasks: protein family classification and protein interaction prediction. Our method is comparable to existing state-of-the-art approaches for protein family classification while being much more general than other architectures. Further, our method outperforms other approaches for protein interaction prediction for two out of three different scenarios that we generated. These results offer a promising framework for fine-tuning the pre-trained sequence representations for other protein prediction tasks.

Authors

  • Ananthan Nambiar
    Department of Bioengineering, University of Illinois at Urbana-Champaign, Urbana, Illinois, USA.
  • Simon Liu
    Medical Genomics Unit, National Human Genome Research Institute, Bethesda, MD.
  • Maeve Heflin
    Department of Computer Science, and University of Illinois at Urbana-Champaign, Urbana, Illinois, USA.
  • John Malcolm Forsyth
    Carl R. Woese Institute for Genomic Biology, University of Illinois at Urbana-Champaign, Urbana, Illinois, USA.
  • Sergei Maslov
    Biology Department, Brookhaven National Laboratory, Upton, New York, USA.
  • Mark Hopkins
    School of Food Science and Nutrition, Faculty of Mathematics and Physical Sciences, University of Leeds , Leeds, UK.
  • Anna Ritz
    Department of Biology, Reed College, Portland, Oregon, USA.