High-throughput prediction of oral acute toxicity in Rat and Mouse of over 100,000 polychlorinated persistent organic pollutants (PC-POPs) by interpretable data fusion-driven machine learning global models.

Journal: Journal of hazardous materials
PMID:

Abstract

This study utilized available oral acute toxicity data in Rat and Mouse for polychlorinated persistent organic pollutants (PC-POPs) to construct data fusion-driven machine learning (ML) global models. Based on atom-centered fragments (ACFs), the collected high-throughput data overcame the applicability limitations, enabling accurate toxicity prediction for a wide range of PC-POPs series compounds using only single models. The data variances in the Rat training and test sets were 1.52 and 1.34, respectively, while for the Mouse, the values were 1.48 and 1.36, respectively. Genetic algorithm (GA) was used to build multiple linear regression (MLR) models and pre-screen descriptors, addressing the "black-box" problem prevalent in ML and enhancing model interpretability. The best ML models for Rat and Mouse achieved approximately 90 % prediction reliability for over 100,000 true untested compounds. Ultimately, a warning list of highly toxic compounds for eight categories of polychlorinated atom-centered fragments (PCACFs) was generated based on the prediction results. The analysis of descriptors revealed that dioxin analogs generally exhibited higher toxicity, because the heteroatoms and ring systems increased structural complexity and formed larger conjugated systems, contributing to greater oral acute toxicity. The present study provides valuable insights for guiding the subsequent in vivo tests, environmental risk assessment and the improvement of global governance system of pollutants.

Authors

  • Shuo Chen
    Department of Thoracic Surgery Beijing Chao-Yang Hospital Affiliated Capital Medical University Beijing China.
  • Tengjiao Fan
    Beijing Key Laboratory of Environmental & Viral Oncology, College of Life Science & Bioengineering, Beijing University of Technology, Beijing 100124, China. fantengjiao2014@emails.bjut.edu.cn.
  • Ting Ren
    Beijing Key Laboratory of Environmental & Viral Oncology, College of Life Science & Bioengineering, Beijing University of Technology, Beijing 100124, China. renting@bjut.edu.cn.
  • Na Zhang
    Department of Nutrition and Food Hygiene, School of Public Health, Peking University, Beijing, China.
  • Lijiao Zhao
    Beijing Key Laboratory of Environmental & Viral Oncology, College of Life Science & Bioengineering, Beijing University of Technology, Beijing 100124, China. zhaolijiao@bjut.edu.cn.
  • Rugang Zhong
    Beijing Key Laboratory of Environmental & Viral Oncology, College of Life Science & Bioengineering, Beijing University of Technology, Beijing 100124, China. lifesci@bjut.edu.cn.
  • Guohui Sun
    Beijing Key Laboratory of Environmental & Viral Oncology, College of Life Science & Bioengineering, Beijing University of Technology, Beijing 100124, China. sunguohui@bjut.edu.cn.