MS2Query: reliable and scalable MS mass spectra-based analogue search.

Journal: Nature communications
Published Date:

Abstract

Metabolomics-driven discoveries of biological samples remain hampered by the grand challenge of metabolite annotation and identification. Only few metabolites have an annotated spectrum in spectral libraries; hence, searching only for exact library matches generally returns a few hits. An attractive alternative is searching for so-called analogues as a starting point for structural annotations; analogues are library molecules which are not exact matches but display a high chemical similarity. However, current analogue search implementations are not yet very reliable and relatively slow. Here, we present MS2Query, a machine learning-based tool that integrates mass spectral embedding-based chemical similarity predictors (Spec2Vec and MS2Deepscore) as well as detected precursor masses to rank potential analogues and exact matches. Benchmarking MS2Query on reference mass spectra and experimental case studies demonstrate improved reliability and scalability. Thereby, MS2Query offers exciting opportunities to further increase the annotation rate of metabolomics profiles of complex metabolite mixtures and to discover new biology.

Authors

  • Niek F de Jonge
    Bioinformatics Group, Wageningen University & Research, 6708 PB, Wageningen, the Netherlands. niek.dejonge@wur.nl.
  • Joris J R Louwen
    Bioinformatics Group, Wageningen University & Research, 6708 PB, Wageningen, the Netherlands.
  • Elena Chekmeneva
    National Phenome Centre, Section of Bioanalytical Chemistry, Division of Systems Medicine, Department of Metabolism, Digestion and Reproduction, Faculty of Medicine, Imperial College London, Hammersmith Hospital Campus, London, W12 0NN, UK.
  • Stephane Camuzeaux
    National Phenome Centre, Section of Bioanalytical Chemistry, Division of Systems Medicine, Department of Metabolism, Digestion and Reproduction, Faculty of Medicine, Imperial College London, Hammersmith Hospital Campus, London, W12 0NN, UK.
  • Femke J Vermeir
    Department of Microbiology, Radboud Institute for Biological and Environmental Sciences, Radboud University, 6525ED, Nijmegen, the Netherlands.
  • Robert S Jansen
    Department of Microbiology, Radboud Institute for Biological and Environmental Sciences, Radboud University, 6525ED, Nijmegen, the Netherlands.
  • Florian Huber
    Centre for Digitalization and Digitality (ZDD), University of Applied Sciences Düsseldorf, Düsseldorf, Germany. florian.huber@hs-duesseldorf.de.
  • Justin J J van der Hooft
    Bioinformatics Group, Wageningen University, Wageningen, The Netherlands. justin.vanderhooft@wur.nl.