π-PrimeNovo: an accurate and efficient non-autoregressive deep learning model for de novo peptide sequencing.

Journal: Nature communications
PMID:

Abstract

Peptide sequencing via tandem mass spectrometry (MS/MS) is essential in proteomics. Unlike traditional database searches, deep learning excels at de novo peptide sequencing, even for peptides missing from existing databases. Current deep learning models often rely on autoregressive generation, which suffers from error accumulation and slow inference speeds. In this work, we introduce π-PrimeNovo, a non-autoregressive Transformer-based model for peptide sequencing. With our architecture design and a CUDA-enhanced decoding module for precise mass control, π-PrimeNovo achieves significantly higher accuracy and up to 89x faster inference than state-of-the-art methods, making it ideal for large-scale applications like metaproteomics. Additionally, it excels in phosphopeptide mining and detecting low-abundance post-translational modifications (PTMs), marking a substantial advance in peptide sequencing with broad potential in biological research.

Authors

  • Xiang Zhang
    Department of Orthopedics, Orthopedic Research Institute, West China Hospital, Sichuan University, Chengdu, Sichuan, China.
  • Tianze Ling
    Tsinghua University, Beijing, China.
  • Zhi Jin
  • Sheng Xu
    School of Physics and Information Engineering, Jiangsu Second Normal University, Nanjing, 211200, China.
  • Zhiqiang Gao
    Beijing Entry-Exit Inspection and Quarantine Bureau, Beijing 100026, China.
  • Boyan Sun
    School of Civil and Hydraulic Engineering, Ningxia University, Yinchuan, China.
  • Zijie Qiu
    Shanghai Artificial Intelligence Laboratory, Shanghai, China.
  • Jiaqi Wei
    Shanghai Artificial Intelligence Laboratory, Shanghai, China.
  • Nanqing Dong
  • Guangshuai Wang
    Shanghai Artificial Intelligence Laboratory, Shanghai, China.
  • Guibin Wang
    State Key Laboratory of Medical Proteomics, Beijing Proteome Research Center, National Center for Protein Sciences (Beijing), Beijing Institute of Lifeomics, Beijing, China.
  • Leyuan Li
    State Key Laboratory of Medical Proteomics, Beijing Proteome Research Center, National Center for Protein Sciences (Beijing), Beijing Institute of Lifeomics, Beijing, China.
  • Muhammad Abdul-Mageed
    University of British Columbia, Vancouver, BC, Canada.
  • Laks V S Lakshmanan
  • Fuchu He
    Beijing Institute of Radiation Medicine, 27 Taiping Road, Beijing 100850, China.
  • Wanli Ouyang
    Shanghai AI Laboratory, Shanghai, China.
  • Cheng Chang
    Beijing Institute of Lifeomics, Beijing 102206, China.
  • Siqi Sun
    Toyota Technological Institute at Chicago, Chicago, IL 60615, USA.