To Fly, or Not to Fly, That Is the Question: A Deep Learning Model for Peptide Detectability Prediction in Mass Spectrometry.

Journal: Journal of proteome research
Published Date:

Abstract

Identifying detectable peptides, known as flyers, is key in mass spectrometry-based proteomics. Peptide detectability is strongly related to peptide sequences and their resulting physicochemical properties. Moreover, the high variability in MS data challenges the development of a generic model for detectability prediction, underlining the need for customizable tools. We present Pfly, a deep learning model developed to predict peptide detectability based solely on peptide sequence. Pfly is a versatile and reliable state-of-the-art tool, offering high performance, accessibility, and easy customizability for end-users. This adaptability allows researchers to tailor Pfly to specific experimental conditions, improving accuracy and expanding applicability across various research fields. Pfly is an encoder-decoder with an attention mechanism, classifying peptides as flyers or non-flyers, and providing both binary and categorical probabilities for four distinct classes defined in this study. The model was initially trained on a synthetic peptide library and subsequently fine-tuned with a biological dataset to mitigate bias toward synthesizability, improving predictive capacity and outperforming state-of-the-art predictors in benchmark comparisons across different human and cross-species datasets. The study further investigates the influence of protein abundance and rescoring, illustrating the negative impact on peptide identification due to misclassification. Pfly has been integrated into the DLOmix framework and is accessible on GitHub at https://github.com/wilhelm-lab/dlomix.

Authors

  • Naim Abdul-Khalek
    Department of Chemistry and Bioscience, Aalborg University, Fredrik Bajers Vej 7H, Aalborg 9220, Denmark.
  • Mario Picciani
    Computational Mass Spectrometry, TUM School of Life Sciences, Technical University of Munich, Freising, Germany.
  • Omar Shouman
    Computational Mass Spectrometry, Technical University of Munich, 85354 Freising, Germany.
  • Reinhard Wimmer
    Department of Chemistry and Bioscience, Aalborg University, Fredrik Bajers Vej 7H, Aalborg 9220, Denmark.
  • Michael Toft Overgaard
    Department of Chemistry and Bioscience, Aalborg University, Aalborg, Denmark.
  • Mathias Wilhelm
    Chair for Proteomics and Bioanalytics, TU Muenchen, Freising 85354, Germany.
  • Simon Gregersen Echers
    Department of Chemistry and Bioscience, Aalborg University, Fredrik Bajers Vej 7H, Aalborg 9220, Denmark.. Electronic address: sgr@bio.aau.dk.