Explainable no-code OECD-compliant machine learning models to predict the mutagenic activity of polycyclic aromatic hydrocarbons and their radical cation metabolites.
Journal:
The Science of the total environment
PMID:
40101616
Abstract
Polycyclic aromatic hydrocarbons (PAHs) are persistent pollutants with well-known genotoxic and mutagenic effects, posing risks to ecosystems and human health. Their hydrophobic nature promotes accumulation in soils and aquatic environments, increasing exposure risks. Upon metabolic activation, PAHs generate reactive species that form DNA adducts, driving their mutagenic potential. This study presents an OECD-compliant methodology that integrates conceptual density functional theory (CDFT) calculations at the GFN2-xTB level with machine learning models to predict PAH mutagenicity. Using quantum chemical descriptors of procarcinogens and radical cation metabolites alongside Ames test data, key electronic properties linked to mutagenicity were identified. Feature selection consistently highlighted radical cation descriptors as key indicators of metabolic activation pathways. Machine learning models - including SPAARC, Random Tree, and JCHAID - achieved validation accuracies exceeding 89 %, with minimal false-negative rates, ensuring conservative predictions for environmental risk assessment. The PSL and CDP electrophilicity frameworks proved particularly effective in modeling DNA damage-related processes. This no-code, freeware-based methodology provides a scalable and cost-effective tool for assessing mutagenic risks in environmentally relevant conditions. The findings reinforce the importance of metabolic activation, validate the radical cation as a reliable proxy for this process, and demonstrate the predictive value of electronic properties in QSAR modeling. These insights support advances in environmental toxicology and contribute to improved strategies for regulatory risk assessment.