Large-Scale Modeling of Multispecies Acute Toxicity End Points Using Consensus of Multitask Deep Learning Methods.

Journal: Journal of chemical information and modeling
Published Date:

Abstract

Computational methods to predict molecular properties regarding safety and toxicology represent alternative approaches to expedite drug development, screen environmental chemicals, and thus significantly reduce associated time and costs. There is a strong need and interest in the development of computational methods that yield reliable predictions of toxicity, and many approaches, including the recently introduced deep neural networks, have been leveraged towards this goal. Herein, we report on the collection, curation, and integration of data from the public data sets that were the source of the ChemIDplus database for systemic acute toxicity. These efforts generated the largest publicly available such data set comprising > 80,000 compounds measured against a total of 59 acute systemic toxicity end points. This data was used for developing multiple single- and multitask models utilizing random forest, deep neural networks, convolutional, and graph convolutional neural network approaches. For the first time, we also reported the consensus models based on different multitask approaches. To the best of our knowledge, prediction models for 36 of the 59 end points have never been published before. Furthermore, our results demonstrated a significantly better performance of the consensus model obtained from three multitask learning approaches that particularly predicted the 29 smaller tasks (less than 300 compounds) better than other models developed in the study. The curated data set and the developed models have been made publicly available at https://github.com/ncats/ld50-multitask, https://predictor.ncats.io/, and https://cactus.nci.nih.gov/download/acute-toxicity-db (data set only) to support regulatory and research applications.

Authors

  • Sankalp Jain
    Department of Pharmaceutical Chemistry, Division of Drug Design and Medicinal Chemistry, University of Vienna, 1090 Vienna, Austria.
  • Vishal B Siramshetty
    Structural Bioinformatics Group, Experimental and Clinical Research Center (ECRC), Charité - University Medicine Berlin, Berlin, Germany ; BB3R - Berlin Brandenburg 3R Graduate School, Free University of Berlin, Berlin, Germany.
  • Vinicius M Alves
    Laboratory for Molecular Modeling, UNC Eshelman School of Pharmacy, University of North Carolina, Chapel Hill, NC, USA.
  • Eugene N Muratov
    Laboratory for Molecular Modeling, UNC Eshelman School of Pharmacy, University of North Carolina, Chapel Hill, North Carolina, United States of America.
  • Nicole Kleinstreuer
    National Toxicology Program Interagency Center for the Evaluation of Alternative Toxicological Methods, NIEHS, Durham, North Carolina 27560, USA.
  • Alexander Tropsha
    Laboratory for Molecular Modeling, UNC Eshelman School of Pharmacy, University of North Carolina, Chapel Hill, NC, USA.
  • Marc C Nicklaus
    Computer-Aided Drug Design (CADD) Group, Chemical Biology Laboratory, Center for Cancer Research, National Cancer Institute, National Institutes of Health, DHHS, NCI-Frederick, 376 Boyles Street, Frederick, Maryland 21702, United States.
  • Anton Simeonov
    National Center for Advancing Translational Sciences, National Institutes of Health, Rockville, MD, USA.
  • Alexey V Zakharov
    National Center for Advancing Translational Sciences (NCATS) , National Institutes of Health , 9800 Medical Center Drive , Rockville , Maryland 20850 , United States.