Comments on "Dialogue between algorithms and soil: Machine learning unravels the mystery of phthalates pollution in soil" by Pan et al. (2025).
Journal:
Journal of hazardous materials
Published Date:
Apr 22, 2025
Abstract
Pan et al. demonstrated the superior predictive performance of their machine learning ML models for soil phthalate PAE concentrations, highlighting the critical role of feature importance as assessed by SHapley Additive exPlanations (SHAP). Notably, the Multilayer Perceptron (MLP) model achieved the highest performance (R² = 0.8637), followed by SVR and XGBoost. However, concerns persist regarding the reliability of feature importance derived from these models and their SHAP interpretations. Specifically, predictive accuracy does not guarantee the validity of feature rankings due to the inherent biases present in tree-based, neural network, and kernel-based methods, which are further exacerbated by SHAP's inherent dependency on model outputs. To mitigate these biases, integrating robust statistical methods is crucial. Techniques such as Spearman's rho, Kendall's tau, Goodman-Kruskal's gamma, Somers' delta, and Hoeffding's dependence, combined with p-value analysis, offer unbiased assessments. Integrating these statistical methods alongside ML models ensures a more reliable evaluation of feature importance in environmental risk modeling. Consequently, future research should prioritize methodologies that combine ML with rigorous statistical validation to enhance accuracy and reduce biases.
Authors
Keywords
No keywords available for this article.