Deep Dive into Machine Learning Models for Protein Engineering.

Journal: Journal of chemical information and modeling
Published Date:

Abstract

Protein redesign and engineering has become an important task in pharmaceutical research and development. Recent advances in technology have enabled efficient protein redesign by mimicking natural evolutionary mutation, selection, and amplification steps in the laboratory environment. For any given protein, the number of possible mutations is astronomical. It is impractical to synthesize all sequences or even to investigate all functionally interesting variants. Recently, there has been an increased interest in using machine learning to assist protein redesign, since prediction models can be used to virtually screen a large number of novel sequences. However, many state-of-the-art machine learning models, especially deep learning models, have not been extensively explored. Moreover, only a small selection of protein sequence descriptors has been considered. In this work, the performance of prediction models built using an array of machine learning methods and protein descriptor types, including two novel, single amino acid descriptors and one structure-based three-dimensional descriptor, is benchmarked. The predictions were evaluated on a diverse collection of public and proprietary data sets, using a variety of evaluation metrics. The results of this comparison suggest that Convolution Neural Network models built with amino acid property descriptors are the most widely applicable to the types of protein redesign problems faced in the pharmaceutical industry.

Authors

  • Yuting Xu
    Biostatistics and Research Decision Sciences, Merck & Co., Inc., Rahway, NJ, United States.
  • Deeptak Verma
    Computational and Structural Chemistry, Merck & Co., Inc., Kenilworth, New Jersey 07033, United States.
  • Robert P Sheridan
  • Andy Liaw
    Biostatistics and Research Decision Sciences, Merck & Co., Inc., Rahway, NJ, United States.
  • Junshui Ma
    Biometrics Research Department and ‡Structural Chemistry Department, Merck Research Laboratories , Rahway, New Jersey 07065, United States.
  • Nicholas M Marshall
    Invenra, Inc., 505 South Rosa Road, Madison, Wisconsin 53719, United States.
  • John McIntosh
    Process Research & Development, Merck & Co., Inc., Rahway, New Jersey 07065, United States.
  • Edward C Sherer
    Computational and Structural Chemistry, Merck & Co., Inc., Kenilworth, New Jersey 07033, United States.
  • Vladimir Svetnik
    Biostatistics and Research Decision Sciences, Merck & Co., Inc., Rahway, NJ, United States.
  • Jennifer M Johnston
    Computational and Structural Chemistry, Merck & Co., Inc., Kenilworth, New Jersey 07033, United States.