LM-GVP: an extensible sequence and structure informed deep learning framework for protein property prediction.

Journal: Scientific reports
Published Date:

Abstract

Proteins perform many essential functions in biological systems and can be successfully developed as bio-therapeutics. It is invaluable to be able to predict their properties based on a proposed sequence and structure. In this study, we developed a novel generalizable deep learning framework, LM-GVP, composed of a protein Language Model (LM) and Graph Neural Network (GNN) to leverage information from both 1D amino acid sequences and 3D structures of proteins. Our approach outperformed the state-of-the-art protein LMs on a variety of property prediction tasks including fluorescence, protease stability, and protein functions from Gene Ontology (GO). We also illustrated insights into how a GNN prediction head can inform the fine-tuning of protein LMs to better leverage structural information. We envision that our deep learning framework will be generalizable to many protein property prediction problems to greatly accelerate protein engineering and drug development.

Authors

  • Zichen Wang
    Department of Pharmacology and Systems Therapeutics, Department of Genetics and Genomic Sciences, BD2K-LINCS Data Coordination and Integration Center (DCIC), Mount Sinai's Knowledge Management Center for Illuminating the Druggable Genome (KMC-IDG), Icahn School of Medicine at Mount Sinai, New York, NY, USA.
  • Steven A Combs
  • Ryan Brand
    Amazon Machine Learning Solutions Lab, Amazon Web Services, Santa Clara, CA, USA.
  • Miguel Romero Calvo
    Amazon Machine Learning Solutions Lab, Amazon Web Services, Santa Clara, CA, USA.
  • Panpan Xu
  • George Price
    Amazon Machine Learning Solutions Lab, Amazon Web Services, Santa Clara, CA, USA.
  • Nataliya Golovach
    Janssen Biotherapeutics, The Janssen Pharmaceutical Companies of Johnson & Johnson, Spring House, PA, USA.
  • Emmanuel O Salawu
    Amazon Machine Learning Solutions Lab, Amazon Web Services, Santa Clara, CA, USA.
  • Colby J Wise
    Amazon Machine Learning Solutions Lab, Amazon Web Services, Santa Clara, CA, USA.
  • Sri Priya Ponnapalli
    Amazon Machine Learning Solutions Lab, Amazon Web Services, Santa Clara, CA, USA. priyapo@amazon.com.
  • Peter M Clark
    Janssen Biotherapeutics, The Janssen Pharmaceutical Companies of Johnson & Johnson, Spring House, PA, USA. PClark3@its.jnj.com.