Neural Spectral Prediction for Structure Elucidation with Tandem Mass Spectrometry.

Journal: bioRxiv : the preprint server for biology
Published Date:

Abstract

Structural elucidation using untargeted tandem mass spectrometry (MS/MS) has played a critical role in advancing scientific discovery [1, 2]. However, differentiating molecular fragmentation patterns between isobaric structures remains a prominent challenge in metabolomics [3-10], drug discovery [11-13], and reaction screening [14-17], presenting a significant barrier to the cost-effective and rapid identification of unknown molecular structures. Here, we present a geometric deep learning model, ICEBERG, that simulates collision-induced dissociation in mass spectrometry to generate chemically plausible fragments and their relative intensities with awareness of collision energies and polarities. We utilize ICEBERG predictions to facilitate structure elucidation by ranking a set of candidate structures based on the similarity between their predicted MS/MS spectra and an experimental MS/MS spectrum of interest. This integrated elucidation pipeline enables state-of-the-art performance in compound annotation, with 40% top-1 accuracy on the NIST'20 [M+H] adduct subset and with 92% of correct structures appearing in the top ten predictions in the same dataset. We demonstrate several real-world case studies, including identifying clinical biomarkers of depression and tuberculous meningitis, annotating an aqueous abiotic degradation product of the pesticide thiophanate methyl, disambiguating isobaric products in pooled reaction screening, and annotating biosynthetic pathways in . Overall, this deep learning-based, chemically-interpretable paradigm for structural elucidation enables rapid molecular annotation from complex mixtures, driving discoveries across diverse scientific domains.

Authors

  • Runzhong Wang
  • Mrunali Manjrekar
  • Babak Mahjour
  • Julian Avila-Pacheco
    Broad Institute, Cambridge, Massachusetts, USA.
  • Joules Provenzano
    Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge 02139, Massachusetts, United States.
  • Erin Reynolds
  • Magdalena Lederbauer
  • Eivgeni Mashin
  • Samuel Goldman
    MIT Computational and Systems Biology, Cambridge, Massachusetts, United States of America.
  • Mingxun Wang
    Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California, San Diego, 9500 Gilman Drive, La Jolla, California 92093, United States.
  • Jing-Ke Weng
    Institute for Plant-Human Interface, Northeastern University, Boston, MA 02115; Department of Chemistry and Chemical Biology, Northeastern University, Boston, MA 02115; Department of Bioengineering, Northeastern University, Boston, MA 02115; Department of Chemical Engineering, Northeastern University, Boston, MA 02115. Electronic address: jingke.weng@northeastern.edu.
  • DesirĂ©e L Plata
  • Clary B Clish
    Broad Institute, Cambridge, Massachusetts, USA.
  • Connor W Coley
    Department of Chemical Engineering, Massachusetts Institute of Technology 77 Massachusetts Avenue Cambridge MA 02139 USA whgreen@mit.edu kfjensen@mit.edu.

Keywords

No keywords available for this article.