Machine Learning Models Based on Enlarged Chemical Spaces for Screening Carcinogenic Chemicals.

Journal: Chemical research in toxicology

Published Date: Jul 21, 2025

Abstract

Machine learning (ML) models for screening carcinogenic chemicals are critical for the sound management of chemicals. Previous models were built on small-scale datasets and lacked applicability domain (AD) characterization that is necessary for regulatory applications of the models. In the current study, an enlarged dataset containing 1697 compounds (940 carcinogens and 757 non-carcinogens) was curated and employed to construct screening models based on 12 types of molecular fingerprints, four ML algorithms, and two graph neural networks. The AD of the optimal model was defined by a state-of-the-art characterization methodology (AD) based on the analysis of structure-activity landscapes (SALs). Results showed that an optimal model based on the random forest algorithm with the PubChem fingerprints outperformed previous ones, with an area under the receiver operating characteristic curve of 86.2% on the validation set imposed with the AD. The optimal model, coupled with the AD, was employed to screen carcinogenic chemicals in the Inventory of Existing Chemical Substances of China (IECSC) and plastic additives datasets, identifying 1282 chemicals from the IECSC and 841 plastic additives as carcinogenic chemicals. The screening model coupled with AD may serve as a promising tool for prioritizing chemicals of carcinogenic concern, facilitating the sound management of chemicals.

Authors

Chao Wu
Jingwen Chen

Key Laboratory of Industrial Ecology and Environmental Engineering (MOE), School of Environmental Science and Technology, Dalian University of Technology, Dalian, 116024, China. Electronic address: jwchen@dlut.edu.cn.
Yuxuan Zhang

School of Electrical Engineering, Yanshan University, 438 Hebei Avenue, Qinhuangdao 066004, China. Electronic address: 1535937433@qq.com.
Zhongyu Wang

a Key Laboratory of Industrial Ecology and Environmental Engineering (MOE), School of Environmental Science and Technology , Dalian University of Technology , Dalian , China.
Zijun Xiao

Key Laboratory of Industrial Ecology and Environmental Engineering (Ministry of Education), Dalian Key Laboratory on Chemicals Risk Control and Pollution Prevention Technology, School of Environmental Science and Technology, Dalian University of Technology, Dalian 116024, China.
Wenjia Liu

2 Mechanical Engineering, University of California, Los Angeles (UCLA), Los Angeles, CA, USA.
Haobo Wang

Synthetic and Functional Biomolecules Center, Beijing National Laboratory for Molecular Sciences, Key Laboratory of Bioorganic Chemistry and Molecular Engineering of Ministry of Education, Peking University , Beijing 100871, China.

Keywords

Algorithms Carcinogens Humans Machine Learning Neural Networks, Computer

External Resources

View on PubMed Access via DOI PubMed (40579351)

Machine Learning Models Based on Enlarged Chemical Spaces for Screening Carcinogenic Chemicals.

Abstract

Authors

Keywords

External Resources

Popular Topics

Recent Journals

Machine Learning Models Based on Enlarged Chemical Spaces for Screening Carcinogenic Chemicals.

Abstract

Authors

Keywords

External Resources

Stay Ahead of Medical AI

Popular Topics

Recent Journals