Searching molecular structure databases with tandem mass spectra using CSI:FingerID.

Journal: Proceedings of the National Academy of Sciences of the United States of America
Published Date:

Abstract

Metabolites provide a direct functional signature of cellular state. Untargeted metabolomics experiments usually rely on tandem MS to identify the thousands of compounds in a biological sample. Today, the vast majority of metabolites remain unknown. We present a method for searching molecular structure databases using tandem MS data of small molecules. Our method computes a fragmentation tree that best explains the fragmentation spectrum of an unknown molecule. We use the fragmentation tree to predict the molecular structure fingerprint of the unknown compound using machine learning. This fingerprint is then used to search a molecular structure database such as PubChem. Our method is shown to improve on the competing methods for computational metabolite identification by a considerable margin.

Authors

  • Kai Dührkop
    Chair for Bioinformatics, Friedrich Schiller University, 07743 Jena, Germany;
  • Huibin Shen
    Helsinki Institute for Information Technology, Department of Computer Science, Aalto University, 02150 Espoo, Finland.
  • Marvin Meusel
    Chair for Bioinformatics, Friedrich Schiller University, 07743 Jena, Germany;
  • Juho Rousu
    Department of Computer Science, Aalto University, 00076, Aalto, Finland. juho.rousu@aalto.fi.
  • Sebastian Böcker
    Chair for Bioinformatics, Friedrich Schiller University, 07743 Jena, Germany; sebastian.boecker@uni-jena.de.