Exploring the potential of machine learning to understand the occurrence and health risks of haloacetic acids in a drinking water distribution system.

Journal: The Science of the total environment
PMID:

Abstract

Determining the occurrence of disinfection byproducts (DBPs) in drinking water distribution system (DWDS) remains challenging. Predicting DBPs using readily available water quality parameters can help to understand DBPs associated risks and capture the complex interrelationships between water quality and DBP occurrence. In this study, we collected drinking water samples from a distribution network throughout a year and measured the related water quality parameters (WQPs) and haloacetic acids (HAAs). 12 machine learning (ML) algorithms were evaluated. Random Forest (RF) achieved the best performance (i.e., R of 0.78 and RMSE of 7.74) for predicting HAAs concentration. Instead of using cytotoxicity or genotoxicity separately as the surrogate for evaluating toxicity associated with HAAs, we created a health risk index (HRI) that was calculated as the sum of cytotoxicity and genotoxicity of HAAs following the widely used Tic-Tox approach. Similarly, ML models were developed to predict the HRI, and RF model was found to perform the best, obtaining R of 0.69 and RMSE of 0.38. To further explore advanced ML approaches, we developed 3 models using uncertainty-based active learning. Our findings revealed that Categorical Boosting Regression (CAT) model developed through active learning substantially outperformed other models, achieving R of 0.87 and 0.82 for predicting concentration and the HRI, respectively. Feature importance analysis with the CAT model revealed that temperature, ions (e.g., chloride and nitrate), and DOC concentration in the distribution network had a significant impact on the occurrence of HAAs. Meanwhile, chloride ion, pH, ORP, and free chlorine were found as the most important features for HRI prediction. This study demonstrates that ML has the potential in the prediction of HAA occurrence and toxicity. By identifying key WQPs impacting HAA occurrence and toxicity, this research offers valuable insights for targeted DBP mitigation strategies.

Authors

  • Ying Yu
    School of Chemistry and Environment, Guangzhou Key Laboratory of Analytical Chemistry for Biomedicine, South China Normal University, Guangzhou 510006, PR China. Electronic address: yuyhs@scnu.edu.cn.
  • Md Mahjib Hossain
    Department of Civil and Environmental Engineering, South Dakota School of Mines and Technology, Rapid City, SD 57701, USA.
  • Rabbi Sikder
    Department of Civil and Environmental Engineering, South Dakota School of Mines and Technology, Rapid City, SD 57701, USA.
  • Zhenguo Qi
    Drinking Water Science and Technology, Research Center for Eco-Environmental Sciences, Chinese Academy of Sciences, Beijing 100085, China.
  • Lixin Huo
    Drinking Water Science and Technology, Research Center for Eco-Environmental Sciences, Chinese Academy of Sciences, Beijing 100085, China.
  • Ruya Chen
    School of Environmental Science and Engineering, Zhejiang Gongshang University, Hangzhou 310018, Zhejiang, China. Electronic address: chenruya2021@163.com.
  • Wenyue Dou
    Key Laboratory of Industrial Pollution Control and Reuse of Jiangsu Province, College of Environmental Engineering, Xuzhou University of Technology, Xuzhou 221018, China.
  • Baoyou Shi
    Drinking Water Science and Technology, Research Center for Eco-Environmental Sciences, Chinese Academy of Sciences, Beijing 100085, China.
  • Tao Ye
    Ministry of Education Key Laboratory of Micro and Nano Systems for Aerospace, School of Mechanical Engineering, Northwestern Polytechnical University, Xi'an 710072, China.