Molecular polarizability as the universal driver of HPAH fate: Evidence from DFT-validated machine learning.

Journal: Journal of hazardous materials
Published Date:

Abstract

A DFT-explainable machine-learning (DFT-XML) strategy integrated with an active learning mechanism was proposed and validated to connect quantum-level molecular properties with macroscopic environmental behaviors of halogenated polycyclic aromatic hydrocarbons (HPAHs). Density Functional Theory (DFT)-derived quantum descriptors were combined with conventional physicochemical features for 57 representative HPAHs to construct a stacking-ensemble model, HPAHs-SM. Notably, a Query-By-Committee (QBC) strategy was implemented to quantify epistemic uncertainty, transforming the model from a static estimation tool into a dynamic guidance system for experimental design. The model predicted four key endpoints-Log P (octanol-water partition coefficient), Biowin (QSAR-based biodegradability estimate), Log BCF (bioconcentration factor), and Log KOC (organic carbon-water partition coefficient)-with strong test-set performance (R2 values of 0.875, 0.902, 0.842, and 0.909, respectively). Crucially, the framework was anchored in empirical reality through validation against literature-reported environmental half-lives (t1/2), demonstrating robust agreement between predicted degradability trends and physical persistence. the active learning strategy prioritized high-uncertainty targets (e.g., Log BCF standard deviation = 0.705) for validation, effectively bridging the gap between computational modeling and experimental reality. Mechanistic interpretation using SHapley Additive exPlanations (SHAP) and descriptor analysis identified molecular polarizability as the principal determinant of adsorption and migration, while LUMO energy was found to strongly regulate degradability and potential biotoxicity. Compounds were projected into the model's interpretation space and subjected to unsupervised clustering, yielding five risk-based subgroups that improved translation from microscopic descriptors to macroscopic risk assessment; for instance, 6-chloro/6-bromo-benzo[a]pyrene-characterized by high polarizability and large molecular mass-was classified as high risk with a "strong adsorption-low degradation" profile. An open-access web platform (HPAHs-ML) was developed to support molecular input, multi-endpoint prediction, and visualization (https://hpahs-stacking-model.streamlit.app/). The proposed DFT-XML paradigm was demonstrated to provide an interpretable, self-evolving and data-driven route for environmental risk assessment and for guiding the design of lower-risk halogenated contaminants.

Authors

Keywords

No keywords available for this article.