Letter to the Editor regarding "Prediction of PFAS bioaccumulation in different plant tissues with machine learning models based on molecular fingerprints" by Song et al. (2024), Sci. Total Environ. 950 175091.

Journal: The Science of the total environment

Published Date: May 23, 2025

Abstract

Song et al. (2024), "Prediction of PFAS bioaccumulation in different plant tissues with machine learning models based on molecular fingerprints," employed machine learning methods, such as XGBoost and SHapley Additive exPlanations (SHAP), to predict PFAS bioaccumulation, reporting high predictive accuracy. However, this commentary critically examines their interpretation of feature importance, since high predictive accuracy does not guarantee reliable feature importance. Both XGBoost and SHAP are known to exhibit biases, such as overemphasizing features used in early splits and inheriting biases from the underlying model. Furthermore, the high dimensionality and potential collinearity of molecular fingerprints complicate SHAP interpretation, increasing overfitting risk and compromising SHAP value stability. To provide a general example, we conducted an independent simulation using a publicly available dataset of US industrial facilities and environmental compliance, demonstrating significant discrepancies between feature importance rankings from XGBoost and robust statistical tests. This commentary advocates for robust statistical methods coupled with p-values, including Spearman's rho, Kendall's tau, Goodman-Kruskal's gamma, Somers' delta, and Hoeffding's dependence, for feature selection. These non-parametric methods, which are independent of specific model assumptions and rely on data ranks, are better suited to capture complex relationships in high-dimensional data, providing a more reliable foundation for future PFAS bioaccumulation research.

Authors

Souichi Oka

SciencePark Corporation, 3-24-9 Iriya-Nishi Zama-shi, Kanagawa 252-0029, Japan. Electronic address: souichi.oka@sciencepark.co.jp.
Yoshiyasu Takefuji

Faculty of Data Science, Musashino University, 3-3-3 Ariake Koto-ku, Tokyo, 135-8181, Japan.

Keywords

Bioaccumulation Environmental Monitoring Environmental Pollutants Fluorocarbons Machine Learning Plants

External Resources

View on PubMed Access via DOI PubMed (40412074)

Letter to the Editor regarding "Prediction of PFAS bioaccumulation in different plant tissues with machine learning models based on molecular fingerprints" by Song et al. (2024), Sci. Total Environ. 950 175091.

Abstract

Authors

Keywords

External Resources

Popular Topics

Recent Journals