MIST-CF: Chemical Formula Inference from Tandem Mass Spectra.

Journal: Journal of chemical information and modeling
Published Date:

Abstract

Chemical formula annotation for tandem mass spectrometry (MS/MS) data is the first step toward structurally elucidating unknown metabolites. While great strides have been made toward solving this problem, the current state-of-the-art method depends on time-intensive, proprietary, and expert-parametrized fragmentation tree construction and scoring. In this work, we extend our previous spectrum Transformer methodology into an energy-based modeling framework, MIST-CF: Metabolite Inference with Spectrum Transformers for Chemical Formula prediction, for learning to rank chemical formula and adduct assignments given an unannotated MS/MS spectrum. Importantly, MIST-CF learns in a data-dependent fashion using a Formula Transformer neural network architecture and circumvents the need for fragmentation tree construction. We train and evaluate our model on a large open-access database, showing an absolute improvement of 10% top 1 accuracy over other neural network architectures. We further validate our approach on the CASMI2022 challenge data set, achieving nearly equivalent performance to the winning entry within the positive mode category without any manual curation or postprocessing of our results. These results demonstrate an exciting strategy to more powerfully leverage MS2 fragment peaks for predicting MS1 precursor chemical formulas with data-driven learning.

Authors

  • Samuel Goldman
    MIT Computational and Systems Biology, Cambridge, Massachusetts, United States of America.
  • Jiayi Xin
    Statistics and Actuarial Science, The University of Hong Kong, Hong Kong 999077, China.
  • Joules Provenzano
    Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge 02139, Massachusetts, United States.
  • Connor W Coley
    Department of Chemical Engineering, Massachusetts Institute of Technology 77 Massachusetts Avenue Cambridge MA 02139 USA whgreen@mit.edu kfjensen@mit.edu.