Generative AI Models for the Protein Scaffold Filling Problem.

Journal: Journal of computational biology : a journal of computational molecular cell biology
PMID:

Abstract

De novo protein sequencing is an important problem in proteomics, playing a crucial role in understanding protein functions, drug discovery, design and evolutionary studies, etc. Top-down and bottom-up tandem mass spectrometry are popular approaches used in the field of mass spectrometry to analyze and sequence proteins. However, these approaches often produce incomplete protein sequences with gaps, namely scaffolds. The protein scaffold filling problem refers to filling the missing amino acids in the gaps of a scaffold to infer the complete protein sequence. In this article, we tackle the protein scaffold filling problem based on generative AI techniques, such as convolutional denoising autoencoder, transformer, and generative pretrained transformer (GPT) models, to complete the protein sequences and compare our results with recently developed convolutional long short-term memory-based sequence model. We evaluate the model performance both on a real dataset and generated datasets. All proposed models show outstanding prediction accuracy. Notably, the GPT-2 model achieves 100% gap-filling accuracy and 100% full sequence accuracy on the MabCampth protein scaffold, which outperforms the other models.

Authors

  • Letu Qingge
    Department of Computer Science, North Carolina A&T State University, Greensboro, North Carolina, USA.
  • Kushal Badal
    Department of Computer Science, North Carolina A&T State University, Greensboro, North Carolina, USA.
  • Richard Annan
    Department of Computer Science, North Carolina A&T State University, Greensboro, North Carolina, USA.
  • Jordan Sturtz
    Department of Computer Science, North Carolina A&T State University, Greensboro, North Carolina, USA.
  • Xiaowen Liu
    School of Informatics and Computing, Indiana University-Purdue University Indianapolis, Indianapolis, Indiana 46202, United States.
  • Binhai Zhu
    Gianforte School of Computing, Montana State University, Bozeman, Montana, USA.