Deep learning the collisional cross sections of the peptide universe from a million experimental values.

Journal: Nature communications
Published Date:

Abstract

The size and shape of peptide ions in the gas phase are an under-explored dimension for mass spectrometry-based proteomics. To investigate the nature and utility of the peptide collisional cross section (CCS) space, we measure more than a million data points from whole-proteome digests of five organisms with trapped ion mobility spectrometry (TIMS) and parallel accumulation-serial fragmentation (PASEF). The scale and precision (CV < 1%) of our data is sufficient to train a deep recurrent neural network that accurately predicts CCS values solely based on the peptide sequence. Cross section predictions for the synthetic ProteomeTools peptides validate the model within a 1.4% median relative error (R > 0.99). Hydrophobicity, proportion of prolines and position of histidines are main determinants of the cross sections in addition to sequence-specific interactions. CCS values can now be predicted for any peptide and organism, forming a basis for advanced proteomics workflows that make full use of the additional information.

Authors

  • Florian Meier
    Department Proteomics and Signal Transduction, Max Planck Institute of Biochemistry, Martinsried, Germany.
  • Niklas D Köhler
    Institute of Computational Biology, Helmholtz Zentrum München-German Research Center for Environmental Health, Neuherberg, Germany.
  • Andreas-David Brunner
    Department Proteomics and Signal Transduction, Max Planck Institute of Biochemistry, Martinsried, Germany.
  • Jean-Marc H Wanka
    Institute of Computational Biology, Helmholtz Zentrum München-German Research Center for Environmental Health, Neuherberg, Germany.
  • Eugenia Voytik
    Department Proteomics and Signal Transduction, Max Planck Institute of Biochemistry, Martinsried, Germany.
  • Maximilian T Strauss
    Department of Proteomics and Signal Transduction, Max Planck Institute of Biochemistry, Martinsried, Germany.
  • Fabian J Theis
    Institute of Computational Biology, Helmholtz Zentrum München, German Research Center for Environmental Health, Munich, Germany.
  • Matthias Mann
    From the ‡Proteomics and Signal Transduction Group and mmann@biochem.mpg.de.