Predicting Peptide Ionization Efficiencies for Electrospray Ionization Mass Spectrometry Using Machine Learning.

Journal: Journal of the American Society for Mass Spectrometry
PMID:

Abstract

Mass spectrometry (MS) is inherently an information-rich technique. In this era of big data, label-free MS quantification for nontargeted studies has gained increasing popularity, especially for complex systems. One of the cornerstones of successful label-free quantification is the predictive modeling of ionization efficiency (IE) based on solutes' physicochemical properties. While many have studied IE modeling for small molecules, there are limited reports on peptide IEs. In this study, we leverage the stoichiometric relationship in trypsin digests of well-characterized monoclonal antibodies (mAbs) to compile a data set of relative ionization efficiencies (RIEs) for 241 peptides. From each peptide's sequence, we computed a set of physiochemical descriptors, which were then used to train machine learning regression models to predict RIEs. Peptides shorter than 20 amino acids had RIEs that were highly correlated to their molecular weight. A random forest (RF) model was able to best predict the RIEs of a test data set with a mean relative error of 23.9%. For larger peptides, a multilayer perceptron (MLP) model improved RIE prediction compared to current best practices, reducing mean relative error from 60.5% to 32.0%. Finally, we also show the application of the RF model in label-free relative protein quantification and improving the quantification of peptide post-translational modifications (PTMs). This approach to predicting peptide IEs from their sequences enables the development of accurate label-free quantification workflows for peptide and protein analysis.

Authors

  • Justin A Kaskow
    David H. Koch School of Chemical Engineering Practice, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States.
  • Eric T Hahnert
    David H. Koch School of Chemical Engineering Practice, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States.
  • Thomas K Porter
    David H. Koch School of Chemical Engineering Practice, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States.
  • Yali Lu
    Analytical Sciences, BioPharmaceuticals R&D, AstraZeneca, Gaithersburg, Maryland 20878, United States.
  • Valentin Stanev
    Data Science and Modeling, BioPharmaceuticals R&D, AstraZeneca, Gaithersburg, Maryland 20878, United States.
  • Chendi Niu
    Analytical Sciences, BioPharmaceuticals R&D, AstraZeneca, Gaithersburg, Maryland 20878, United States.
  • Wei Xu
    College of Food and Bioengineering, Henan University of Science and Technology, Luoyang, 471023 China.
  • Methal Albarghouthi
    Analytical Sciences, BioPharmaceuticals R&D, AstraZeneca, Gaithersburg, Maryland 20878, United States.
  • Chunlei Wang
    Analytical Sciences, BioPharmaceuticals R&D, AstraZeneca, Gaithersburg, Maryland 20878, United States.