Using machine learning to predict the effects and consequences of mutations in proteins.

Journal: Current opinion in structural biology
Published Date:

Abstract

Machine and deep learning approaches can leverage the increasingly available massive datasets of protein sequences, structures, and mutational effects to predict variants with improved fitness. Many different approaches are being developed, but systematic benchmarking studies indicate that even though the specifics of the machine learning algorithms matter, the more important constraint comes from the data availability and quality utilized during training. In cases where little experimental data are available, unsupervised and self-supervised pre-training with generic protein datasets can still perform well after subsequent refinement via hybrid or transfer learning approaches. Overall, recent progress in this field has been staggering, and machine learning approaches will likely play a major role in future breakthroughs in protein biochemistry and engineering.

Authors

  • Daniel J Diaz
    Center for Systems and Synthetic Biology, Department of Chemistry, and Department of Molecular Biosciences, The University of Texas at Austin, Austin, TX, USA.
  • Anastasiya V Kulikova
    Center for Systems and Synthetic Biology and Department of Integrative Biology, The University of Texas at Austin, Austin, TX, USA.
  • Andrew D Ellington
    Center for Systems and Synthetic Biology and Department of Molecular Biosciences, The University of Texas at Austin, Austin, TX, USA.
  • Claus O Wilke
    Department of Integrative Biology, The University of Texas at Austin, Austin, TX, USA. wilke@austin.utexas.edu.