Enhancing Toxicity Prediction of Synthetic Chemicals via Novel SMILES Fragmentation and Interpretable Deep Learning.

Journal: Journal of chemical information and modeling
Published Date:

Abstract

Toxicity prediction and identification of structural alerts (SAs) for synthetic chemicals are critical for assessing risks to environmental and human health. Traditional methods, which rely heavily on molecular descriptors, often suffer from poor interpretability. Here, we introduce a novel framework that integrates SMILES fragmentation strategies with a 1D convolutional neural network deep learning model (denoted as the SFDL) for predicting chemical toxicity and associated SAs. Four distinct fragmentation methods, single-atom, single-symbol, atom-centered, and symbol-centered, were evaluated to generate tokenizers (denoted as GenTok) from 581537 high-interest PubChem compounds. The symbol-centered fragmentation approach demonstrated superior performance on the ISSSTY AMES mutagenicity data set (AUC = 0.87, PRAUC = 0.90). This SFDL-GenTok strategy demonstrated robust predictive performance across 6 out of the 10 toxicity end points (AUC = 0.81∼0.93, PRAUC = 0.70∼0.94). Based on these models, toxicity predictions were conducted for 28160 synthetic chemicals. Potential toxic compounds were subsequently categorized into three groups: endocrine disruption, mutagenicity, and mitochondrial toxicity. SAs analysis revealed that halogenated fragments, nitro or phenolic groups, and reactive electrophilic motifs are critical contributors to endocrine disruption, mitochondrial toxicity, and mutagenicity. This study provides an interpretable tool for toxicity and SAs identification of synthetic chemicals.

Authors

  • Yumian Zhou
    Key Laboratory of Modern Toxicology of Ministry of Education, Center for Global Health, School of Public Health, Nanjing Medical University, 101 Longmian Avenue, Jiangning District, Nanjing 21166, Jiangsu, China.
  • Yu He
    Key Laboratory for Analytical Science of Food Safety and Biology, Fujian Provincial Key Laboratory of Analysis and Detection Technology for Food Safety, College of Chemistry, Fuzhou University, Fuzhou, Fujian, 350116, China.
  • Wenzheng Zhou
    Key Laboratory of Modern Toxicology of Ministry of Education, Center for Global Health, School of Public Health, Nanjing Medical University, 101 Longmian Avenue, Jiangning District, Nanjing 21166, Jiangsu, China.
  • Zhencheng Hua
    Key Laboratory of Modern Toxicology of Ministry of Education, Center for Global Health, School of Public Health, Nanjing Medical University, 101 Longmian Avenue, Jiangning District, Nanjing 21166, Jiangsu, China.
  • Yijing Wang
    Shanghai Key Laboratory of Atmospheric Particle Pollution and Prevention, Department of Environmental Science and Engineering, Fudan University, Shanghai 200438, China.
  • Chao Chen
    Department of Neonatology, Children's Hospital of Fudan University, National Children's Medical Center, Shanghai, China.