DeepSub: Utilizing Deep Learning for Predicting the Number of Subunits in Homo-Oligomeric Protein Complexes.

Journal: International journal of molecular sciences
PMID:

Abstract

The molecular weight (MW) of an enzyme is a critical parameter in enzyme-constrained models (ecModels). It is determined by two factors: the presence of subunits and the abundance of each subunit. Although the number of subunits (NS) can potentially be obtained from UniProt, this information is not readily available for most proteins. In this study, we addressed this gap by extracting and curating subunit information from the UniProt database to establish a robust benchmark dataset. Subsequently, we propose a novel model named DeepSub, which leverages the protein language model and Bi-directional Gated Recurrent Unit (GRU), to predict NS in homo-oligomers solely based on protein sequences. DeepSub demonstrates remarkable accuracy, achieving an accuracy rate as high as 0.967, surpassing the performance of QUEEN. To validate the effectiveness of DeepSub, we performed predictions for protein homo-oligomers that have been reported in the literature but are not documented in the UniProt database. Examples include homoserine dehydrogenase from , Matrilin-4 from and , and the Multimerins protein family from and . The predicted results align closely with the reported findings in the literature, underscoring the reliability and utility of DeepSub.

Authors

  • Rui Deng
    FL 8, Ocean International Center E, Chaoyang Rd Side Rd, ShiLiPu, Chaoyang Qu, 100000 Beijing Shi, China.
  • Ke Wu
    Shanghai Medical Aid Team in Wuhan, Shanghai General Hospital, Shanghai Jiao Tong University, School of Medicine, Shanghai, China.
  • Jiawei Lin
    Department of Computer Science, School of Information Science and Technology, Xiamen University, Xiamen 361005, China. 23020161153321@stu.xmu.edu.cn.
  • Dehang Wang
    College of Biotechnology, Tianjin University of Science and Technology, Tianjin 300457, China.
  • Yuanyuan Huang
    College of Biotechnology, Tianjin University of Science and Technology, Tianjin 300457, China.
  • Yang Li
    Occupation of Chinese Center for Disease Control and Prevention, Beijing, China.
  • Zhenkun Shi
    Biodesign Center, Key Laboratory of Engineering Biology for Low-Carbon Manufacturing, Tianjin Institute of Industrial Biotechnology, Chinese Academy of Sciences, Tianjin 300308, China.
  • Zihan Zhang
  • Zhiwen Wang
    Institute of Biomedical Engineering, School of Control Science and Engineering, Shandong University, Jinan, Shandong, 250061, China.
  • Zhitao Mao
    Biodesign Center, Key Laboratory of Engineering Biology for Low-Carbon Manufacturing, Tianjin Institute of Industrial Biotechnology, Chinese Academy of Sciences, Tianjin 300308, China.
  • Xiaoping Liao
    Biodesign Centre, Key Laboratory of Systems Microbial Technology, Tianjin Institute of Industrial Biotechnology, Chinese Academy of Sciences, Tianjin 300308, PR China.
  • Hongwu Ma
    Biodesign Centre, Key Laboratory of Systems Microbial Technology, Tianjin Institute of Industrial Biotechnology, Chinese Academy of Sciences, Tianjin 300308, PR China. Electronic address: ma_hw@tib.cas.cn.