Quantitative structure retention relationship (QSRR) modelling for Analytes' retention prediction in LC-HRMS by applying different Machine Learning algorithms and evaluating their performance.

Journal: Journal of chromatography. B, Analytical technologies in the biomedical and life sciences
Published Date:

Abstract

In metabolomics, retention prediction methods have been developed based on the structural and physicochemical characteristics of analytes. Such methods employ regression models, harnessing machine learning algorithms mapping experimentally derived retention time (t) analytes with various structural and physicochemical descriptors, known as Quantitative Structure Retention Relationships (QSRR) models. In the present study, QSRR models have been developed by applying four Machine Learning regression algorithms, i.e. Bayesian Ridge Regression (BRidgeR), Extreme Gradient Boosting Regression (XGBR) and Support Vector Regression (SVR) using both linear and non-linear kernels, all tested and compared for their retention prediction ability on experimentally derived and on publicly available chromatographic data, using Molecular Descriptors to describe the physical, chemical or structural properties of molecules. Various configurations of the available datasets, in terms of the highly-correlated features levels (defined as the maximum absolute value of the Pearson's correlation coefficient calculated between any pair of features) they contained, were analyzed in parallel. This is the first study, to the best of our knowledge, of the effect of collinearity on the performance of QSRR predictive models. In the vast majority of cases studied there was no statistically significant difference in the performance of the generated QSRR predictive models among the specified dataset configurations, indicative of the ability of the selected regression algorithms to effectively handle collinearity. In terms of the individual performance of the selected regression algorithms, no pattern was found where one algorithm (or class of algorithms) stood out significantly relative to the others among the study datasets.

Authors

  • T Liapikos
    Department of Chemistry, Aristotle University of Thessaloniki, 541 24, Thessaloniki, Greece; Biomic_AUTh, Center for Interdisciplinary Research and Innovation (CIRI-AUTH), Balkan Center, B1.4, Thessaloniki, 10th km Thessaloniki-Thermi Rd, P.O. Box 8318, GR 57001, Greece. Electronic address: tliapikos@chem.auth.gr.
  • C Zisi
    Department of Chemistry, Aristotle University of Thessaloniki, 541 24, Thessaloniki, Greece; Biomic_AUTh, Center for Interdisciplinary Research and Innovation (CIRI-AUTH), Balkan Center, B1.4, Thessaloniki, 10th km Thessaloniki-Thermi Rd, P.O. Box 8318, GR 57001, Greece.
  • D Kodra
    Department of Chemistry, Aristotle University of Thessaloniki, 541 24, Thessaloniki, Greece; Biomic_AUTh, Center for Interdisciplinary Research and Innovation (CIRI-AUTH), Balkan Center, B1.4, Thessaloniki, 10th km Thessaloniki-Thermi Rd, P.O. Box 8318, GR 57001, Greece.
  • K Kademoglou
    Department of Chemistry, Aristotle University of Thessaloniki, 541 24, Thessaloniki, Greece; Biomic_AUTh, Center for Interdisciplinary Research and Innovation (CIRI-AUTH), Balkan Center, B1.4, Thessaloniki, 10th km Thessaloniki-Thermi Rd, P.O. Box 8318, GR 57001, Greece.
  • D Diamantidou
    Department of Chemistry, Aristotle University of Thessaloniki, 541 24, Thessaloniki, Greece; Biomic_AUTh, Center for Interdisciplinary Research and Innovation (CIRI-AUTH), Balkan Center, B1.4, Thessaloniki, 10th km Thessaloniki-Thermi Rd, P.O. Box 8318, GR 57001, Greece.
  • O Begou
    Department of Chemistry, Aristotle University of Thessaloniki, 541 24, Thessaloniki, Greece; Biomic_AUTh, Center for Interdisciplinary Research and Innovation (CIRI-AUTH), Balkan Center, B1.4, Thessaloniki, 10th km Thessaloniki-Thermi Rd, P.O. Box 8318, GR 57001, Greece.
  • A Pappa-Louisi
    Department of Chemistry, Aristotle University of Thessaloniki, 541 24, Thessaloniki, Greece.
  • G Theodoridis
    Department of Chemistry, Aristotle University of Thessaloniki, 541 24, Thessaloniki, Greece; Biomic_AUTh, Center for Interdisciplinary Research and Innovation (CIRI-AUTH), Balkan Center, B1.4, Thessaloniki, 10th km Thessaloniki-Thermi Rd, P.O. Box 8318, GR 57001, Greece.