N-GlycoPred: A hybrid deep learning model for accurate identification of N-glycosylation sites.

Journal: Methods (San Diego, Calif.)
Published Date:

Abstract

Studies have shown that protein glycosylation in cells reflects the real-time dynamics of biological processes, and the occurrence and development of many diseases are closely related to protein glycosylation. Abnormal protein glycosylation can be used as a potential diagnostic and prognostic marker of a disease, as well as a therapeutic target and a new breakthrough point for exploring pathogenesis. To address the issue of significant differences in the prediction results of previous models for different species, we constructed a hybrid deep learning model N-GlycoPred on the basis of dual-layer convolution, a paired attention mechanism and BiLSTM for accurate identification of N-glycosylation sites. By adopting one-hot encoding or the AAindex, we specifically selected the optimum combination of features and deep learning frameworks for human and mouse to refine the models. Based on six independent test datasets, our N-GlycoPred model achieved an average AUC of 0.9553, which is 0.23% higher than MusiteDeep. The comparison results indicate that our model can serve as a powerful tool for N-glycosylation site prescreening for biological researchers.

Authors

  • Fengzhu Hu
    School of Science, Dalian Maritime University, Dalian 116026, China.
  • Jie Gao
    Department of Nephrology, Shandong Provincial Hospital Affiliated to Shandong First Medical University, Jinan, China.
  • Jia Zheng
    School of Advanced Manufacturing Engineering, Chongqing University of Posts and Telecommunications, Chongqing 400065, China.
  • Cheekeong Kwoh
    School of Computer Science and Engineering, Nanyang Technological University, Singapore.
  • Cangzhi Jia
    Department of Mathematics, Dalian Maritime University, No. 1 Linghai Road, Dalian 116026, China. Electronic address: cangzhijia@dlmu.edu.cn.