Enhancing Toxicity Prediction of Synthetic Chemicals via Novel SMILES Fragmentation and Interpretable Deep Learning.

Journal: Journal of chemical information and modeling

Published Date: Aug 26, 2025

Abstract

Toxicity prediction and identification of structural alerts (SAs) for synthetic chemicals are critical for assessing risks to environmental and human health. Traditional methods, which rely heavily on molecular descriptors, often suffer from poor interpretability. Here, we introduce a novel framework that integrates SMILES fragmentation strategies with a 1D convolutional neural network deep learning model (denoted as the SFDL) for predicting chemical toxicity and associated SAs. Four distinct fragmentation methods, single-atom, single-symbol, atom-centered, and symbol-centered, were evaluated to generate tokenizers (denoted as GenTok) from 581537 high-interest PubChem compounds. The symbol-centered fragmentation approach demonstrated superior performance on the ISSSTY AMES mutagenicity data set (AUC = 0.87, PRAUC = 0.90). This SFDL-GenTok strategy demonstrated robust predictive performance across 6 out of the 10 toxicity end points (AUC = 0.81∼0.93, PRAUC = 0.70∼0.94). Based on these models, toxicity predictions were conducted for 28160 synthetic chemicals. Potential toxic compounds were subsequently categorized into three groups: endocrine disruption, mutagenicity, and mitochondrial toxicity. SAs analysis revealed that halogenated fragments, nitro or phenolic groups, and reactive electrophilic motifs are critical contributors to endocrine disruption, mitochondrial toxicity, and mutagenicity. This study provides an interpretable tool for toxicity and SAs identification of synthetic chemicals.

Authors

Yumian Zhou

Key Laboratory of Modern Toxicology of Ministry of Education, Center for Global Health, School of Public Health, Nanjing Medical University, 101 Longmian Avenue, Jiangning District, Nanjing 21166, Jiangsu, China.
Yu He

Key Laboratory for Analytical Science of Food Safety and Biology, Fujian Provincial Key Laboratory of Analysis and Detection Technology for Food Safety, College of Chemistry, Fuzhou University, Fuzhou, Fujian, 350116, China.
Wenzheng Zhou

Key Laboratory of Modern Toxicology of Ministry of Education, Center for Global Health, School of Public Health, Nanjing Medical University, 101 Longmian Avenue, Jiangning District, Nanjing 21166, Jiangsu, China.
Zhencheng Hua

Key Laboratory of Modern Toxicology of Ministry of Education, Center for Global Health, School of Public Health, Nanjing Medical University, 101 Longmian Avenue, Jiangning District, Nanjing 21166, Jiangsu, China.
Yijing Wang

Shanghai Key Laboratory of Atmospheric Particle Pollution and Prevention, Department of Environmental Science and Engineering, Fudan University, Shanghai 200438, China.
Chao Chen

Department of Neonatology, Children's Hospital of Fudan University, National Children's Medical Center, Shanghai, China.

Keywords

Deep Learning Humans Mutagenicity Tests Neural Networks, Computer

External Resources

View on PubMed Access via DOI PubMed (40856693)

Enhancing Toxicity Prediction of Synthetic Chemicals via Novel SMILES Fragmentation and Interpretable Deep Learning.

Abstract

Authors

Keywords

External Resources

Popular Topics

Recent Journals

Enhancing Toxicity Prediction of Synthetic Chemicals via Novel SMILES Fragmentation and Interpretable Deep Learning.

Abstract

Authors

Keywords

External Resources

Don't Miss the Future of Medicine

Popular Topics

Recent Journals