Machine learning-assisted data filtering and QSAR models for prediction of chemical acute toxicity on rat and mouse.

Journal: Journal of hazardous materials
Published Date:

Abstract

Machine learning (ML) methods provide a new opportunity to build quantitative structure-activity relationship (QSAR) models for predicting chemicals' toxicity based on large toxicity data sets, but they are limited in insufficient model robustness due to poor data set quality for chemicals with certain structures. To address this issue and improve model robustness, we built a large data set on rat oral acute toxicity for thousands of chemicals, then used ML to filter chemicals favorable for regression models (CFRM). In comparison to chemicals not favorable for regression models (CNRM), CFRM accounted for 67% of chemicals in the original data set, and had a higher structural similarity and a smaller toxicity distribution in 2-4 log (mg/kg). The performance of established regression models for CFRM was greatly improved, with root-mean-square deviations (RMSE) in the range of 0.45-0.48 log (mg/kg). Classification models were built for CNRM using all chemicals in the original data set, and the area under receiver operating characteristic (AUROC) reached 0.75-0.76. The proposed strategy was successfully applied to a mouse oral acute data set, yielding RMSE and AUROC in the range of 0.36-0.38 log (mg/kg) and 0.79, respectively.

Authors

  • Tao Bo
    Key Laboratory of Environmental and Applied Microbiology, Chengdu Institute of Biology, Chinese Academy of Sciences, Chengdu 610041, China; Environmental Microbiology Key Laboratory of Sichuan Province, Chengdu 610041, China.
  • Yaohui Lin
    State Key Laboratory of Environmental Chemistry and Ecotoxicology, Research Center for Eco-Environmental Sciences, Chinese Academy of Sciences, P.O. Box 2871, Beijing 100085, China; Key Laboratory for Analytical Science of Food Safety and Biology of MOE, Fujian Provincial Key Lab of Analysis and Detection for Food Safety, College of Chemistry, Fuzhou University, Fuzhou, Fujian 350116, China.
  • Jinglong Han
    State Key Laboratory of Urban Water Resource and Environment, Harbin Institute of Technology Shenzhen, Shenzhen 518055, China.
  • Zhineng Hao
    State Key Laboratory of Environmental Chemistry and Ecotoxicology, Research Center for Eco-Environmental Sciences, Chinese Academy of Sciences, P.O. Box 2871, Beijing 100085, China. Electronic address: znhao@rcees.ac.cn.
  • Jingfu Liu
    School of Environment, Hangzhou Institute for Advanced Study, UCAS, Hangzhou 310024, China; State Key Laboratory of Environmental Chemistry and Ecotoxicology, Research Center for Eco-Environmental Sciences, Chinese Academy of Sciences, P.O. Box 2871, Beijing 100085, China. Electronic address: jfliu@rcees.ac.cn.