Uncovering Thousands of New Peptides with Sequence-Mask-Search Hybrid Peptide Sequencing Framework.

Journal: Molecular & cellular proteomics : MCP
PMID:

Abstract

Typical analyses of mass spectrometry data only identify amino acid sequences that exist in reference databases. This restricts the possibility of discovering new peptides such as those that contain uncharacterized mutations or originate from unexpected processing of RNAs and proteins. peptide sequencing approaches address this limitation but often suffer from low accuracy and require extensive validation by experts. Here, we develop SMSNet, a deep learning-based peptide sequencing framework that achieves >95% amino acid accuracy while retaining good identification coverage. Applications of SMSNet on landmark proteomics and peptidomics studies reveal over 10,000 previously uncharacterized HLA antigens and phosphopeptides, and in conjunction with database-search methods, expand the coverage of peptide identification by almost 30%. The power to accurately identify new peptides of SMSNet would make it an invaluable tool for any future proteomics and peptidomics studies, including tumor neoantigen discovery, antibody sequencing, and proteome characterization of non-model organisms.

Authors

  • Korrawe Karunratanakul
    Department of Computer Engineering, Faculty of Engineering, Chulalongkorn University, Bangkok 10330, Thailand.
  • Hsin-Yao Tang
    Proteomics and Metabolomics Facility, The Wistar Institute, Philadelphia, PA 19104.
  • David W Speicher
    Center for Systems and Computational Biology, The Wistar Institute, Philadelphia, PA 19104.
  • Ekapol Chuangsuwanich
    Department of Computer Engineering, Faculty of Engineering, Chulalongkorn University, 254 Phayathai Road, Pathumwan, Bangkok, 10330, Thailand. ekapol.c@chula.ac.th.
  • Sira Sriswasdi
    Research Affairs, Faculty of Medicine, Chulalongkorn University, Bangkok, Thailand.