MuCB-tabpfn: A multimodal feature fusion framework for predicting human blood concentrations of organic pollutants.

Journal: Ecotoxicology and environmental safety
Published Date:

Abstract

The accurate prediction of chemical concentrations in human blood is essential for evaluating health risks associated with synthetic organic pollutants. However, existing models frequently suffer from limitations such as data scarcity, incomplete feature representation, and restricted predictive accuracy. To overcome these challenges, we developed MuCB-tabpfn, an advanced multimodal deep learning framework that strategically integrates ADME parameters, PaDEL molecular descriptors, and fine-tuned Himol features obtained through graph-based transfer learning. This integrated approach provides a holistic characterization of chemical properties, encompassing pharmacokinetic behavior, structural attributes, and semantically rich molecular representations, thereby significantly enhancing the prediction of blood concentrations (Cb). Trained on a rigorously curated dataset of 216 environmental compounds compiled from NHANES, Biomonitoring California, and ExposureExplorer databases, MuCB-tabpfn demonstrated exceptional predictive performance, achieving an R2 of 0.856 and RMSE of 1.456 for lnCb. It consistently outperformed conventional machine learning models and single-modality approaches in comparative evaluations. The model also exhibited strong robustness in noise resistance tests and effectively captured complex nonlinear feature interactions. Through SHAP-based interpretability analysis, key influential descriptors were identified, including daily exposure, elimination half-life, and exposure pathway indicators. When applied to screen 156 Substances of Very High Concern, MuCB-tabpfn successfully identified compounds with elevated internal exposure potential, such as Methoxyacetic acid and 1, 2-Dimethoxyethane, demonstrating its practical utility in chemical risk prioritization. By combining high predictive accuracy, resilience to noise, and interpretable insights, MuCB-tabpfn provides a reliable and efficient computational tool for supporting next-generation chemical safety assessment and advancing internal exposure estimation in environmental health research.

Authors

Keywords

No keywords available for this article.