Predictive modeling and interpretability analysis of bioconcentration factors for organic chemicals in fish using machine learning.
Journal:
Environmental pollution (Barking, Essex : 1987)
Published Date:
Jul 15, 2025
Abstract
Chemicals are misused and released into the environment, causing adverse effects on people and ecosystems. Assessing the potential environmental risks of these chemicals before their use is crucial. The bioconcentration factor (BCF) is a key parameter used to describe the extent of chemical bioaccumulation. However, previous experiments to determine BCF values are often time-consuming and costly. In this study, a machine learning (ML) model was developed to predict BCF values using molecular descriptors and 9 algorithms. The random forest (RF) model demonstrated strong predictive performance, achieving R and R values of 0.949 and 0.935. Moreover, it required only 10 easily obtainable features. The Tanimoto similarity coefficient based on molecular structure was used to characterize the applicability domain (AD). We employed SHAP method, which identified primary factors, including hydrophobicity, molecular volume and shape, polarizability and lipophilicity, that have significantly affected BCF values. Furthermore, partial dependence plots (PDP) and 2D interaction were utilized to delve deeper into the relationship between feature values and model predictions. Results showed that MollogP>4.5, SM1_Dzv>0, SM1_Dzp>0, and ZM1C1>35 were linked to higher lgBCF values (3.2 L/kg), indicating stronger bioconcentration potential. Conversely, under other conditions that suggested weaker bioconcentration capacities, the focus should move to environmental migration. The study provided valuable insights into the factors that influence the bioaccumulation of chemicals, while the RF models can be an effective tool for assessing the bioconcentration potential of chemicals.