Enhancing compound confidence in suspect and non-target screening through machine learning-based retention time prediction.

Journal: Environmental pollution (Barking, Essex : 1987)
PMID:

Abstract

The retention time (RT) of contaminants of emerging concern (CECs) in liquid chromatography-high-resolution mass spectrometry (LC-HRMS) is crucial for database matching in non-targeted screening (NTS) analysis. In this study, we developed a machine learning (ML) model to predict RTs of CECs in NTS analysis. Using 1051 CEC standards, we evaluated Random Forest (RF), XGBoost, Support Vector Regression (SVR), and Artificial Neural Network (ANN) with molecular fingerprints and chemical descriptors to establish an optimal model. The SVR model utilizing chemical descriptors resulted in good predictive capacity with R = 0.850 and r = 0.925. The model was further validated through laboratory NTS compound characterization. When applied to examine CEC occurrence in a large wastewater treatment plant, we identified 40 level S1 CECs (confirmed structure by reference standard) and 234 level S2 compounds (probable structure by library spectrum match). The model predicted RTs for level S2 compounds, leading to the classification of 153 level S2 compounds with high confidence (ΔRT <2 min). The model served as a robust filtering mechanism within the analytical framework. This study emphasizes the importance of predicted RTs in NTS analysis and highlights the potential of prediction models. Our research introduces a workflow that enhances NTS analysis by utilizing RT prediction models to determine compound confidence levels.

Authors

  • Dehao Song
    School of Environment and Energy, South China University of Technology, Guangzhou, 510006, China.
  • Ting Tang
    Guangdong Laboratory for Lingnan Modern Agriculture, Guangzhou 510642, China.
  • Rui Wang
    Department of Clinical Laboratory Medicine Center, Inner Mongolia Autonomous Region People's Hospital, Hohhot, Inner Mongolia, China.
  • He Liu
    Division of Endodontics, Department of Oral Biological and Medical Sciences, Faculty of Dentistry, University of British Columbia, Vancouver, BC, Canada.
  • Danping Xie
    South China Institute of Environmental Sciences, Ministry of Ecology and Environment, Guangzhou, 510655, China; Guangxi Key Laboratory of Emerging Contaminants Monitoring, Early Warning and Environmental Health Risk Assessment, Nanning, 530000, China.
  • Bo Zhao
    State Key Laboratory of Management and Control for Complex Systems, Institute of Automation, Chinese Academy of Sciences, Beijing, China.
  • Zhi Dang
    School of Environment and Energy, South China University of Technology, Guangzhou, 510006, China; Guangdong Provincial Key Laboratory of Solid Wastes Pollution Control and Recycling, South China University of Technology, Guangzhou, 510006, China; The Key Lab of Pollution Control and Ecosystem Restoration in Industry Clusters, Ministry of Education, South China University of Technology, Guangzhou, 510006, China.
  • Guining Lu
    School of Environment and Energy, South China University of Technology, Guangzhou, 510006, China; The Key Lab of Pollution Control and Ecosystem Restoration in Industry Clusters, Ministry of Education, South China University of Technology, Guangzhou, 510006, China.