Machine learning prediction of DOC-water partitioning coefficients for organic pollutants from diverse DOM origins.

Journal: Environmental science. Processes & impacts
Published Date:

Abstract

This study aims to improve predictions and understanding of dissolved organic carbon-water partitioning coefficients (), a crucial parameter in environmental risk assessment. A dataset encompassing 709 datapoints across 190 unique organic pollutants and various types of dissolved organic matter (DOM) was compiled. Molecular descriptors were calculated to characterize each compound's properties and structures using Multiwfn, PaDEL and RDKit. Individual machine learning models were established for four different DOM origins: all DOM, natural aquatic DOM, natural terrestrial DOM and commercial DOM. These models exhibited excellent goodness-of-fit, internal stability, and predictive performance with > 0.771, > 0.602, > 0.629, and RMSE ranging from 0.413 to 0.580. Shapley additive explanation analysis identified CrippenLogP and MATS2m as the most influencing factors. CrippenLogP, reflecting hydrophobicity, positively influenced , while MATS2m, characterizing molecular branching and compactness, had a negative effect. Mor29m, where lower values indicate a higher abundance of heteroatoms such as halogens, also showed a negative impact, likely due to enhanced interactions with polar DOM groups. SlogP_VSA1, another descriptor related to hydrophobicity, demonstrated a positive correlation with log  in natural aquatic DOM, while its negative correlation in all DOM may reflect the great diversity of DOM properties in that group. Partial dependence plots revealed that when CrippenLogP > 6, Mor29m between 0.45 and 0.52, MATS2m < -0.015, and SlogP_VSA1 < 7, organic pollutants tended to partition more into DOM. These findings support the application of machine learning models for assessing pollutant interactions with DOM, contributing to improved environmental risk predictions.

Authors

  • Ruyue Jin
    Senior Department of Pediatrics, The Seventh Medical Center of Chinese PLA General Hospital, No.5 Nanmen Cang Hutong, Dongcheng District, Beijing, People's Republic of China.
  • Yuzhen Liang
    School of Environment and Energy, South China University of Technology, Guangzhou 510006, China.
  • Zhenqing Shi
    School of Environment and Energy, South China University of Technology, Guangzhou, Guangdong 510006, People's Republic of China; The Key Lab of Pollution Control and Ecosystem Restoration in Industry Clusters, Ministry of Education, South China University of Technology, Guangzhou, Guangdong 510006, People's Republic of China.