In silico design of smaller size enzymatic protein by generative artificial intelligence (ProtGPT2).

Journal: Journal of bioscience and bioengineering
Published Date:

Abstract

The construction of small proteins by removing amino acid subsequences that are not involved in function, activity, or structure is crucial for bioprocessing and drug development. Traditional design methods often focus on reconstructing functional motifs, but they face challenges in stabilizing structure and reproducing function. In this study, we aimed to develop a design method for small proteins using ProtGPT2, a model that generates protein sequences based on function and structure. First, amino acid sequence data of malate dehydrogenase (MDH) was collected, and ProtGPT2 was fine-tuned (ProtGPT2 for MDH). The chain length and perplexity (ppl) of the generated sequences were evaluated, producing shorter sequences than the natural ones. The validity of the generated sequences was assessed using both population and individual analyses. Population analysis, including multiple sequence alignment (MSA) and t-distributed stochastic neighbor embedding (tSNE), revealed that ProtGPT2 for MDH identified functional motifs of MDH and incorporated them into the generated sequences. Additionally, tSNE showed that the generated sequences were highly similar to natural MDH sequences. In individual analysis, 10 randomly selected sequences were evaluated using BLAST, AlphaFold2, and InterPro. BLAST indicated that 9 sequences were novel MDH variants. AlphaFold2 confirmed that their 3D structures were highly similar to known MDH structures. InterPro identified domains and active sites in 2 sequences, suggesting that they were novel, small MDH variants. In conclusion, ProtGPT2 for MDH has the potential to design amino acid sequence candidates for small MDHs. The validity and utility of the model will be established through future experimental efforts.

Authors

  • Hiroyuki Hamada
    Department of Bioscience and Biotechnology, Faculty of Agriculture, Kyushu University, 744 Motooka, Nishi-ku, Fukuoka 819-0395, Japan. Electronic address: hamada@brs.kyushu-u.ac.jp.
  • Tamon Matsuzawa
    Department of Bioscience and Biotechnology, Faculty of Agriculture, Kyushu University, 744 Motooka, Nishi-ku, Fukuoka 819-0395, Japan.
  • Taizo Hanai
    Department of Bioscience and Biotechnology, Faculty of Agriculture, Kyushu University, 744 Motooka, Nishi-ku, Fukuoka 819-0395, Japan.