Machine Learning for Enhanced Identification Probability in RPLC/HRMS Nontargeted Workflows.

Journal: Analytical chemistry
Published Date:

Abstract

In HRMS-based nontargeted analysis (NTA), spectral matching is crucial for chemical identification, particularly in the absence of retention information. This study introduces class probability of true positives (()) as an innovative approach, leveraging data from MS/MS spectra and calibrant-free predicted retention time indices (RTIs) through 3 machine learning (ML) models to enhance identification probability (IP). The first model is a molecular fingerprint (MF)-to-RTI model trained on 4713 calibrants. The second model, a cumulative neutral loss (CNL)-to-RTI model, utilized 485,577 experimental spectra. The final model, a binary classification model, was trained using 1,686,319 and semisynthetic true negative () spectral matches. High correlations between MF-derived and CNL-derived RTI values ( = 0.96 for training; 0.88 for testing) suggest reduced RTI errors in spectral matches. Incorporating reference spectral library searches and RTI errors, the k-nearest neighbors algorithm achieved a weighted 1 score of 0.65 and a Matthews correlation coefficient of 0.30 for pesticides at concentrations of 1 to 1000 ppb in blank samples, with a recall of 0.60 in black tea matrices. Compared to solely library matching, the average IPs for pesticides increased by 54.5, 52.1, and 46.7% when spiked in blank, 10× diluted, and 100× diluted tea matrices, respectively. This work demonstrates the effectiveness of ML in enhancing the chemical IPs of annotated compounds within complex matrices.

Authors

  • Hiu-Lok Ngan
    State Key Laboratory of Environmental and Biological Analysis, Department of Chemistry, Hong Kong Baptist University.
  • Viktoriia Turkina
    Van 't Hoff Institute for Molecular Sciences (HIMS), University of Amsterdam, Amsterdam 1098 XH, The Netherlands.
  • Denice van Herwerden
    Van 't Hoff Institute for Molecular Sciences (HIMS), University of Amsterdam, Amsterdam 1098 XH, The Netherlands.
  • Hong Yan
  • Zongwei Cai
    State Key Laboratory of Environmental and Biological Analysis, Department of Chemistry, Hong Kong Baptist University, Hong Kong 999077, China. Electronic address: zwcai@hkbu.edu.hk.
  • Saer Samanipour
    Norwegian Institute for Water Research (NIVA), Gaustadalléen 21, 0349 Oslo, Norway; Queensland Alliance for Environmental Health Science (QAEHS), University of Queensland, 20 Cornwall Street, Woolloongabba, QLD 4012, Australia. Electronic address: saer.samanipour@niva.no.

Keywords

No keywords available for this article.