Modelling Mutagenicity Using Multi-Task Deep Learning and REACH Data.
Journal:
Chemical research in toxicology
Published Date:
Jul 18, 2025
Abstract
Under REACH, mutagenicity assessment relies on testing (gene mutation test in bacteria and/or mammalian cells, as well as chromosomal aberration or micronucleus assays in mammalian cells) followed by testing if necessary. This study explored the possibility of using the inherent correlation between these assays to create multi-task deep learning models and examine if they outperform single-task models. An extensive genotoxicity dataset with over 12,000 substances was compiled, including algorithmically curated REACH data and information from several public sources. Genotoxicity information was also retrieved from ToxValDB and literature sources to construct external (hold-out) test sets for a stringent assessment of the models' generalized performance. A range of single-task and multi-task models were investigated from classical machine learning techniques and chemical fingerprints to deep learning methods using graphs for molecular structure representation. The best deep learning single-task model achieved a cross-validation balanced accuracy of 73-84% for the four assays and exceeded classical machine learning by 2-8%. Gene mutation detection for specific bacterial strains and metabolic activation modes exhibited balanced accuracy 82-85%, with improvements ranging from 7% to 12%. Multi-task deep learning models for specific bacterial strains and metabolic activation modes had on average 8% higher cross-validation test balanced accuracy than single-task models but were comparable when assay outcomes were aggregated. The best deep learning models for specific bacterial strains and metabolic activation modes showed external balanced accuracy of 72-78 % when there were at least 200 positives and 200 negatives. The dimensionality-reduced molecular embeddings from graph neural network models were able to distinguish positives from negatives and cluster structures that trigger known genotoxicity structural alerts. The models were also used to identify structural moieties linked to predicted negative genotoxicity in bacteria and positive genotoxicity in mammalian cells.
Authors
Keywords
No keywords available for this article.