Machine Learning Classifies Core and Outer Fucosylation of N-Glycoproteins Using Mass Spectrometry.

Journal: Scientific reports
PMID:

Abstract

Protein glycosylation is known to be involved in biological progresses such as cell recognition, growth, differentiation, and apoptosis. Fucosylation of glycoproteins plays an important role for structural stability and function of N-linked glycoproteins. Although many of biological and clinical studies of protein fucosylation by fucosyltransferases has been reported, structural classification of fucosylated N-glycoproteins such as core or outer isoforms remains a challenge. Here, we report for the first time the classification of N-glycopeptides as core- and outer-fucosylated types using tandem mass spectrometry (MS/MS) and machine learning algorithms such as the deep neural network (DNN) and support vector machine (SVM). Training and test sets of more than 800 MS/MS spectra of N-glycopeptides from the immunoglobulin gamma and alpha 1-acid-glycoprotein standards were selected for classification of the fucosylation types using supervised learning models. The best-performing model had an accuracy of more than 99% against manual characterization and area under the curve values greater than 0.99, which were calculated by probability scores from target and decoy datasets. Finally, this model was applied to classify fucosylated N-glycoproteins from human plasma. A total of 82N-glycopeptides, with 54 core-, 24 outer-, and 4 dual-fucosylation types derived from 54 glycoproteins, were commonly classified as the same type in both the DNN and SVM. Specifically, outer fucosylation was dominant in tri- and tetra-antennary N-glycopeptides, while core fucosylation was dominant in the mono-, bi-antennary and hybrid types of N-glycoproteins in human plasma. Thus, the machine learning methods can be combined with MS/MS to distinguish between different isoforms of fucosylated N-glycopeptides.

Authors

  • Heeyoun Hwang
    Research Center for Bioconvergence Analysis, Korea Basic Science Institute, Cheongju, 28119, Republic of Korea.
  • Hoi Keun Jeong
    Research Center for Bioconvergence Analysis, Korea Basic Science Institute, Cheongju, 28119, Republic of Korea.
  • Hyun Kyoung Lee
    Research Center for Bioconvergence Analysis, Korea Basic Science Institute, Cheongju, 28119, Republic of Korea.
  • Gun Wook Park
    Research Center for Bioconvergence Analysis, Korea Basic Science Institute, Cheongju, 28119, Republic of Korea.
  • Ju Yeon Lee
    Research Center for Bioconvergence Analysis, Korea Basic Science Institute, Cheongju, 28119, Republic of Korea.
  • Soo Youn Lee
    Seoul National University Biomedical Informatics (SNUBI), Division of Biomedical Informatics, Seoul National University College of Medicine, Seoul 110799, South Korea.
  • Young-Mook Kang
    Drug Information Platform Center, Korea Research Institute of Chemical Technology, Daejeon, 34114, Korea.
  • Hyun Joo An
    Graduate School of Analytical Science and Technology, Chungnam National University, Daejeon, 34134, Republic of Korea.
  • Jeong Gu Kang
    Genome Editing Research Center, Korea Research Institute of Bioscience and Biotechnology, Daejeon, 34141, Republic of Korea.
  • Jeong-Heon Ko
    Genome Editing Research Center, Korea Research Institute of Bioscience and Biotechnology, Daejeon, 34141, Republic of Korea.
  • Jin Young Kim
    Department of Orthopaedic Surgery, Dongguk University Ilsan Hospital, Goyang, Korea.
  • Jong Shin Yoo
    Research Center for Bioconvergence Analysis, Korea Basic Science Institute, Cheongju, 28119, Republic of Korea. jongshin@kbsi.re.kr.