MS4MS: LLMs-driven Multi-agent System for Small-molecule Identification via LC-MS/MS

Journal: bioRxiv
Published Date:

Abstract

Small molecule identification is central to research fields such as drug discovery, but in complex systems like Traditional Chinese Medicine (TCM), traditional mass spectrometry analysis methods remain constrained by bottlenecks including insufficient database coverage, fragmented analysis workflows, and poor result interpretability. To address these limitations, we developed MS4MS, a large language model-driven multi-agent system that enables an end-to-end automated pipeline from raw data to small molecule identification. Validation on a public benchmark demonstrates that MS4MS achieves state-of-the-art performance in molecular formula prediction. Furthermore, its innovative small molecule identification agent enables efficient and interpretable compound elucidation. Verification using herbal extracts indicates MS4MS’s outstanding performance regarding analytical coverage and the discrimination of isomers. Consequently, MS4MS offers a novel, accurate, interpretable, and high-throughput end-to-end automated strategy for small molecule identification, overcoming the analytical bottlenecks of traditional mass spectrometry in natural products and complex TCM systems.

Authors

  • Na Guo; Jianbin Guo; Yizhe Liu; Sibo Wei; LiFeng Dong; Hongmin Du; Yang Bai; Yurou Zhao; Xiaoqing Wang; Dajun Zeng; Hongjun Yang