CSU-MS: A Contrastive Learning Framework for Cross-Modal Compound Identification from MS/MS Spectra to Molecular Structures.

Journal: Analytical chemistry
Published Date:

Abstract

Tandem mass spectrometry (MS/MS) is a cornerstone for compound identification in complex mixtures, but conventional spectral matching approaches face critical limitations due to limited library coverage and matching algorithms. To address this, we propose CSU-MS (contrastively spectral-structural Unification framework for MS/MS Spectra and Molecular Structures), a novel framework that bridges MS/MS spectra and molecular structures through cross-modal contrastive learning. CSU-MS uniquely integrates an External Space Attention Aggregation (ESA) module to dynamically align spectral and structural features, enabling direct retrieval of molecular candidates from a unified embedding space. The framework is pretrained on large-scale in-silico MS/MS data sets generated by CFM-ID and ICEBERG, followed by fine-tuning on high-quality experimental data. Results show that CSU-MS achieves a Recall@1 of 75.45% when matching 1047 spectra against a reference library containing 1,001,047 compounds, significantly surpassing existing methods such as CFM-ID (68.38%), SIRIUS (64.85%), MetFrag (48.59%), and CMSSP (30.47%). Furthermore, rigorous validation on three external data sets spanning human metabolomics (MTBLS265), plant metabolites (PMhub), and the CASMI 2022 challenge demonstrates robust generalizability, with domain-specific retrieval achieving a Recall@10 of 91.67% for blood metabolites. To facilitate compound identification across various domains, we have assembled a Spectrum-searchable Structural Feature Database (SSFDB) from 23 structural databases and deployed an open-source web server supporting customizable cross-modal retrieval. All code, models, and SSFDB are publicly accessible, offering a transformative solution for high-throughput compound identification in metabolomics and beyond.

Authors

  • Ting Xie
    Department of Biomedical Engineering, School of Basic Medical Science, Central South University, 410013, Changsha, Hunan, China.
  • Hailiang Zhang
    Department of Urology, Fudan University Shanghai Cancer Center; Department of Oncology, Shanghai Medical College, Fudan University, Shanghai, China.
  • Qiong Yang
    Institute of Modern Physics, Chinese Academy of Science, Lanzhou 730000, China.
  • Jinyu Sun
    College of Chemistry and Chemical Engineering, Central South University, Changsha410083, China.
  • Yue Wang
    Department of Pharmacology, University of North Carolina at Chapel Hill, Chapel Hill, NC, United States.
  • Jia Long
    College of Chemistry and Chemical Engineering, Central South University, Changsha 410083, PR. China.
  • Zhimin Zhang
    School of Control Science and Engineering, Shandong University, Jinan, People's Republic of China. School of Information Technology and Electrical Engineering, University of Queensland, Queensland, Australia.
  • Hongmei Lu
    College of Chemistry and Chemical Engineering, Central South University, Changsha, People's Republic of China.