Modelling Mutagenicity Using Multi-Task Deep Learning and REACH Data.

Journal: Chemical research in toxicology
Published Date:

Abstract

Under REACH, mutagenicity assessment relies on testing (gene mutation test in bacteria and/or mammalian cells, as well as chromosomal aberration or micronucleus assays in mammalian cells) followed by testing if necessary. This study explored the possibility of using the inherent correlation between these assays to create multi-task deep learning models and examine if they outperform single-task models. An extensive genotoxicity dataset with over 12,000 substances was compiled, including algorithmically curated REACH data and information from several public sources. Genotoxicity information was also retrieved from ToxValDB and literature sources to construct external (hold-out) test sets for a stringent assessment of the models' generalized performance. A range of single-task and multi-task models were investigated from classical machine learning techniques and chemical fingerprints to deep learning methods using graphs for molecular structure representation. The best deep learning single-task model achieved a cross-validation balanced accuracy of 73-84% for the four assays and exceeded classical machine learning by 2-8%. Gene mutation detection for specific bacterial strains and metabolic activation modes exhibited balanced accuracy 82-85%, with improvements ranging from 7% to 12%. Multi-task deep learning models for specific bacterial strains and metabolic activation modes had on average 8% higher cross-validation test balanced accuracy than single-task models but were comparable when assay outcomes were aggregated. The best deep learning models for specific bacterial strains and metabolic activation modes showed external balanced accuracy of 72-78 % when there were at least 200 positives and 200 negatives. The dimensionality-reduced molecular embeddings from graph neural network models were able to distinguish positives from negatives and cluster structures that trigger known genotoxicity structural alerts. The models were also used to identify structural moieties linked to predicted negative genotoxicity in bacteria and positive genotoxicity in mammalian cells.

Authors

  • Panagiotis G Karamertzanis
    Computational Assessment and Alternative Methods, European Chemicals Agency (ECHA), Telakkakatu 6, Helsinki 00150, Finland.
  • Mike Rasenberg
    European Chemicals Agency (ECHA), Telakkakatu 6, Helsinki 00150, Finland.
  • Imran Shah
    National Center for Computational Toxicology, Office of Research and Development, U.S. Environmental Protection Agency , Research Triangle Park, Durham, North Carolina 27711, United States.
  • Grace Patlewicz
    National Center for Computational Toxicology, Office of Research and Development, U.S. Environmental Protection Agency , Research Triangle Park, Durham, North Carolina 27711, United States.

Keywords

No keywords available for this article.