PeptideForest: Semisupervised Machine Learning Integrating Multiple Search Engines for Peptide Identification.

Journal: Journal of proteome research
PMID:

Abstract

The first step in bottom-up proteomics is the assignment of measured fragmentation mass spectra to peptide sequences, also known as peptide spectrum matches. In recent years novel algorithms have pushed the assignment to new heights; unfortunately, different algorithms come with different strengths and weaknesses and choosing the appropriate algorithm poses a challenge for the user. Here we introduce PeptideForest, a semisupervised machine learning approach that integrates the assignments of multiple algorithms to train a random forest classifier to alleviate that issue. Additionally, PeptideForest increases the number of peptide-to-spectrum matches that exhibit a q-value lower than 1% by 25.2 ± 1.6% compared to MS-GF+ data on samples containing mixed HEK and proteomes. However, an increase in quantity does not necessarily reflect an increase in quality and this is why we devised a novel approach to determine the quality of the assigned spectra through TMT quantification of samples with known ground truths. Thereby, we could show that the increase in PSMs below 1% q-value does not come with a decrease in quantification quality and as such PeptideForest offers a possibility to gain deeper insights into bottom-up proteomics. PeptideForest has been integrated into our pipeline framework Ursgal and can therefore be combined with a wide array of algorithms.

Authors

  • Tristan Ranff
    Institute of Pharmacy and Molecular Biotechnology, Heidelberg University, 69120 Heidelberg, Germany.
  • Matthew Dennison
    Minds.ai, Santa Cruz, California 95060, United States.
  • Jeroen Bédorf
    Minds.ai, Santa Cruz, California 95060, United States.
  • Stefan Schulze
    Department of Biology, University of Pennsylvania, Philadelphia, Pennsylvania 19104, United States.
  • Nico Zinn
    Cellzome, A GSK Company, Heidelberg 69117, Germany.
  • Marcus Bantscheff
    Cellzome, A GSK Company, Heidelberg 69117, Germany.
  • Jasper J R M van Heugten
    Minds.ai, Santa Cruz, California 95060, United States.
  • Christian Fufezan
    Institute of Pharmacy and Molecular Biotechnology, Heidelberg University, 69120 Heidelberg, Germany.