Artificial intelligence-aided protein engineering: from topological data analysis to deep protein language models.

Journal: Briefings in bioinformatics
PMID:

Abstract

Protein engineering is an emerging field in biotechnology that has the potential to revolutionize various areas, such as antibody design, drug discovery, food security, ecology, and more. However, the mutational space involved is too vast to be handled through experimental means alone. Leveraging accumulative protein databases, machine learning (ML) models, particularly those based on natural language processing (NLP), have considerably expedited protein engineering. Moreover, advances in topological data analysis (TDA) and artificial intelligence-based protein structure prediction, such as AlphaFold2, have made more powerful structure-based ML-assisted protein engineering strategies possible. This review aims to offer a comprehensive, systematic, and indispensable set of methodological components, including TDA and NLP, for protein engineering and to facilitate their future development.

Authors

  • Yuchi Qiu
    Department of Mathematics, Michigan State University, East Lansing, MI USA.
  • Guo-Wei Wei
    Department of Mathematics, Department of Electrical and Computer Engineering, Department of Biochemistry and Molecular Biology, Michigan State University, MI 48824, USA.