Explainable machine learning models for predicting the acute toxicity of pesticides to sheepshead minnow (Cyprinodon variegatus).

Journal: The Science of the total environment
PMID:

Abstract

A quantitative structure-activity relationship (QSAR) study was conducted on 313 pesticides to predict their acute toxicity to Sheepshead minnow (Cyprinodon variegatus) by using DRAGON descriptors. Essentials accounting for a reliable model were all considered carefully, giving full consideration to the OECD (Organization for Economic Co-operation and Development) principles for QSAR acceptability in regulation during the model construction and assessment process. Nine variables were selected through the forward stepwise regression method and used as inputs to construct both linear and nonlinear models. The obtained models were validated internally and externally. Generally, machine learning-based methods, namely support vector machine (SVM), random forest (RF), and projection pursuit regression (PPR), perform better than the multiple linear regression (MLR) model. The statistical results (R = 0.682-0.933, Q = 0.604-0.659, Q = 0.740-0.796, CCC = 0.861-0.882) of the developed models show that they are robust, reliable, reproducible, accurate and predictive. Comparatively, the RF model performs best, giving predictive correlation coefficient Q of 0.814, root mean squared error (RMSE) of 0.658 and mean absolute error (MAE) of 0.534 for the test set, respectively. The RF model (as well as SVM and PPR models) was visualized and explained by using the SHapley Additive explanation (SHAP) analysis to enhance its transparency and credibility. In addition, the applicability domain (AD) range of the RF model was characterized by the Williams plot and the tree manifold approximation and projection (TMAP) technology was utilized to illustrate similarity and diversity of the entire data space, to assist in the analysis of the outliers. Activity cliff detection was investigated by using Arithmetic Residuals in K-groups Analysis (ARKA) descriptors. It was found that none of the pesticides was identified as an activity cliff in the training set or a potential prediction cliff in the test set. Therefore, the RF model fulfills each OECD principle in regulation for QSAR models. The research in this work will aid in the in silico QSAR prediction of the acute toxicity to Sheepshead minnow (Cyprinodon variegatus) for untested and new toxic pesticides and can also be extended to other studies.

Authors

  • Ting Sun
    Key Laboratory for Biomechanics and Mechanobiology of Ministry of Education, Beijing Advanced Innovation Centre for Biomedical Engineering, School of Engineering Medicine, School of Biological Science and Medical Engineering, Beihang University, No.37 Xueyuan Road, Haidian District, Beijing, 100083, People's Republic of China.
  • Chongzhi Wei
    School of Environmental and Municipal Engineering, Lanzhou Jiaotong University, 88 Anning West Rd., Lanzhou 730070, Gansu, PR China.
  • Yang Liu
    Department of Computer Science, Hong Kong Baptist University, Hong Kong, China.
  • Yueying Ren
    School of Environmental and Municipal Engineering, Lanzhou Jiaotong University, 88 Anning West Rd., Lanzhou 730070, Gansu, PR China; Ministry of Education Engineering Research Center of Water Resource Comprehensive Utilization in Cold and Arid Regions, Lanzhou Jiaotong University, 88 Anning West Rd., Lanzhou 730070, Gansu, PR China. Electronic address: renyueying@mail.lzjtu.cn.