Integrating pesticide molecular, soil, and electrolyte features through tree-based ensemble learning for accurate sorption prediction.

Journal: Environmental monitoring and assessment
Published Date:

Abstract

Although machine learning (ML) has been increasingly used to predict pesticide sorption, most existing models rely primarily on molecular and soil descriptors and rarely account for the influence of solution chemistry. This limitation reduces their relevance for environmental monitoring, particularly under variable water quality conditions, such as salinity. In this study, an interpretable ensemble ML framework was developed to predict the soil-water sorption coefficient (K) by integrating pesticide molecular properties, soil characteristics, and electrolyte descriptors. A dataset of 975 laboratory-derived batch sorption observations, covering 65 pesticides, 23 soils, and multiple electrolyte types and concentrations, was compiled from peer-reviewed studies. Six tree-based ensemble models were trained and evaluated, with XGB achieving the best predictive performance (R2 = 0.938; RMSE = 0.248) on an independent test set. Model interpretation using SHAP revealed that electrolyte parameters were the dominant predictors, with electrolyte molecular mass (MME, 28%) and electrolyte concentration (CE, 25%) contributing more than traditional descriptors such as octanol water coefficient (KOW, 18%) and soil organic carbon (OC, 12%). Nonlinear response patterns were consistent with ionic strength effects and cation-bridging mechanisms, supporting the learned relationships' mechanistic plausibility. External validation under unseen conditions confirmed robust generalization (R2 = 0.950). This cost-effective, interpretable framework enables regulators and agronomists to simulate pesticide mobility across varying water quality conditions, thereby supporting sustainable pesticide management and targeted risk assessment.

Authors

Keywords

No keywords available for this article.