Development of an Interpretable Machine Learning Model for Neurotoxicity Prediction of Environmentally Related Compounds.
Journal:
Environmental science & technology
Published Date:
Apr 30, 2025
Abstract
The rising prevalence of nervous system disorders has become a significant global health challenge, with environmental pollutants identified as key contributors. However, the large number of environmental related compounds, combined with the low efficiency of traditional methods, has resulted in substantial gaps in neurotoxicity data. In this study, we developed a robust and interpretable neurotoxicity prediction model using a high-quality data set. To identify the best predictive model, three molecular representation methods (molecular fingerprints, molecular descriptors, and molecular graphs) combined with six traditional machine learning (ML) algorithms and two deep learning (DL) approaches were evaluated. The optimal model, combining molecular fingerprints and descriptors with eXtreme Gradient Boosting (XGBoost), achieved a training accuracy of 0.93 and an area under the curve (AUC) of 0.99, outperforming other ML and DL models, while maintaining interpretability. The model was used to screen 1170 compounds detected in human blood, predicting 1145 successfully. Among 89 compounds with known neurotoxicity data, the model achieved an accuracy of 0.74. It identified 821 potentially neurotoxic compounds, including 36 with high detection concentrations, warranting further study. An online platform (http://www.envwind.site/tools.html) was developed to expand accessibility. This model offers an efficient tool for predicting neurotoxicity and managing environmental health risks.