MAMSI: Integration of Multiassay Liquid Chromatography-Mass Spectrometry Metabolomics Data Using Multiview Machine Learning.

Journal: Analytical chemistry
Published Date:

Abstract

Liquid chromatography-mass spectrometry (LC-MS) is a commonly used analytical technique in untargeted metabolomics. However, the diverse chemical and physical properties of metabolites often require the use of several different analytical assays for broad metabolome coverage. Conventionally, each assay is analyzed separately, but this fails to capture interassay relationships, making multiassay biomarker discovery and data interpretation difficult. Here we propose a workflow to integrate multiassay metabolomics data, designed to enable biomarker discovery and elucidation of unknown metabolites. We employ a multiblock-partial least-squares model (MB-PLS) coupled with multiblock variable importance in projection to estimate the importance of predictors to the outcome variable. Then we cluster the selected predictors and compare them to groups defined by their structural properties based on retention time and mass-to-charge ratio. To demonstrate and evaluate the approach, we used three multiassay data sets predicting biological sex, Alzheimer's disease status, and blood bilirubin levels as the outcomes of interest. The MB-PLS models outperformed single-assay models in both classification and regression tasks, indicating that modeling interblock relationships enabled an improved estimate of phenotypic outcome. Additionally, the MB-PLS models shed valuable insight into each data block's contribution to the predicted outcome. Our workflow enabled us to determine a set of potential cross-assay biomarkers. Following putative annotation, the majority of these and their signs of association agreed with results previously reported in the literature. Our workflow has the potential to benefit the metabolomics community and beyond as it offers interpretable integrative analysis of multiassay LC-MS data and facilitates discovery of potential biomarkers.

Authors

  • Lukas Kopecky
    Section of Bioinformatics, Division of Systems Medicine, Department of Metabolism, Digestion and Reproduction, Faculty of Medicine, Imperial College London, London W12 0NN, U.K.
  • Caroline J Sands
    National Phenome Centre, Department of Metabolism, Digestion and Reproduction, Imperial College London, London W12 0NN, U.K.
  • María Gómez-Romero
    National Phenome Centre, Department of Metabolism, Digestion and Reproduction, Imperial College London, London W12 0NN, U.K.
  • Shivani Misra
    Metabolic Medicine, Department of Metabolism, Digestion and Reproduction, Imperial College London, London W12 0NN, U.K.
  • Elizabeth J Want
    Section of Bioanalytical Chemistry, Division of Systems Medicine, Department of Metabolism, Digestion and Reproduction, Faculty of Medicine, Imperial College London, London W12 0NN, U.K.
  • Timothy M D Ebbels
    Section of Bioinformatics, Division of Systems Medicine, Department of Metabolism, Digestion and Reproduction, Faculty of Medicine, Imperial College London, London W12 0NN, U.K.