EMOCPD: Efficient Attention-Based Models for Computational Protein Design Using Amino Acid Microenvironment.

Journal: Journal of chemical information and modeling
Published Date:

Abstract

Computational protein design (CPD) refers to the use of computational methods to design proteins. Traditional methods relying on energy functions and heuristic algorithms for sequence design are inefficient and do not meet the demands of the big data era in biomolecules, with their accuracy limited by the energy functions and search algorithms. Existing deep learning methods are constrained by the learning capabilities of the networks, failing to extract effective information from sparse protein structures, which limits the accuracy of protein design. To address these shortcomings, we developed an Efficient attention-based models for computational protein design using amino acid microenvironment (EMOCPD). It aims to predict the category of each amino acid in a protein by analyzing the three-dimensional atomic environment surrounding the amino acids, and optimize the protein based on the predicted high-probability potential amino acid categories. EMOCPD employs a multihead attention mechanism to focus on important features in the sparse protein microenvironment and utilizes an inverse residual structure to optimize the network architecture. In protein design, the thermal stability and protein expression of the predicted mutants from EMOCPD show significant improvements compared to the wild type, effectively validating EMOCPD's potential in designing superior proteins. Furthermore, the predictions of EMOCPD are influenced positively, negatively, or have minimal impact based on the content of the 20 amino acids, categorizing amino acids as positive, negative, or neutral. Research findings indicate that EMOCPD is more suitable for designing proteins with lower contents of negative amino acids.

Authors

  • Xiaoqi Ling
    The School of Artificial Intelligence and Computer Science, Jiangnan University, Wuxi 214122, China.
  • Cheng Cai
    Hubei Provincial Clinical Research Center for Alzheimer's Disease (XYX, LYH, DL, GRC, FFH, JZ, JJZ, GBH, JWG, XCL, JYW, DYZ, JL, QQN, DS, SYL, CC, YYC, LX, YMO, XXC, YLZ, YSC, JQL, ZW, QW, YFM, YZ), Tian You Hospital Affiliated to Wuhan University of Science and Technology, Wuhan.
  • Demin Kong
    The School of Biotechnology, Jiangnan University, Wuxi 214122, China.
  • Zhisheng Wei
    School of Biotechnology and Key Laboratory of Industrial Biotechnology Ministry in Jiangnan University, Wuxi, Jiangsu 214012, China.
  • Jing Wu
    School of Pharmaceutical Science, Jiangnan University, Wuxi, 214122, Jiangsu, China.
  • Lei Wang
    Department of Nursing, Beijing Hospital, National Center of Gerontology, Institute of Geriatric Medicine, Chinese Academy of Medical Sciences, Beijing, China.
  • Zhaohong Deng
    School of Digital Media, Jiangnan University, Wuxi, Jiangsu, China.