Interpretable Machine Learning to Understand Wildfire Toxicity: Bridging Chemicals, Omics, and Toxicological Outcomes via Symbolic Regression with Novel Feature Scoring.

Journal: Chemical research in toxicology
Published Date:

Abstract

Wildfire smoke exposures are increasingly common, consisting of complex mixtures of gases and particulates known to cause diverse pulmonary health effects. While health outcomes are regularly studied, quantitative links between smoke chemical composition and toxicological outcomes remain poorly defined, limiting interpretation of wildfire smoke health risks. This study explores symbolic regression (SR) as an interpretable artificial intelligence/machine learning method to generate closed-form mathematical models linking chemical exposure to biological responses relevant to wildfire smoke. Prior to application on wildfire-relevant data sets, we benchmarked three Python-based SR packages on simulated data, assessing performance across varying noise levels and operator complexities. Insights from these simulation tests, such as the importance of including necessary operators, were incorporated when applying SR to lab-generated wildland fire exposure-toxicity data. This data set included chemical characterizations of biomass smoke exposures and corresponding pulmonary responses in female CD-1 mice (n = 60). Specifically, we evaluated the ability to predict a lung injury marker using (1) targeted measures of over 80 chemicals measured in smoke (RMSE = 17.57 mg/mL) and (2) lung tissue measures of hundreds of transcripts (RMSE = 15.12 mg/mL). Resulting error metrics were comparable to Random Forest and XGBoost models. To aid model interpretation, we developed directional ensemble contribution scores (DECS), a novel feature importance scoring method that quantifies the direction and magnitude of predictor contributions across top-performing models. Expert toxicologists also contributed to model prioritization, integrating a "biologists-in-the-loop" approach. Results highlighted polycyclic aromatic hydrocarbons as drivers of lung injury and methoxyphenols as suppressors. Transcriptomic analyses highlighted a small set of genes, which have roles in metabolism, cell proliferation, immune regulation, and oncogenic processes, with MYC proto-oncogene (Myc) showing the strongest association. Overall, this study demonstrates SR and associated DECS as practical, interpretable tools for modeling environmental mixtures, such as wildfire smoke, and their toxicological effects.

Authors

Keywords

No keywords available for this article.