Multitask Deep Learning Models of Combined Industrial Absorption, Distribution, Metabolism, and Excretion Datasets to Improve Generalization.

Journal: Molecular pharmaceutics
Published Date:

Abstract

The optimization of absorption, distribution, metabolism, and excretion (ADME) profiles of compounds is critical to the drug discovery process. As such, machine learning (ML) models for ADME are widely used for prioritizing the design and synthesis of compounds. The effectiveness of ML models for ADME depends on the availability of high-quality experimental data for a diverse set of compounds that is relevant to the emerging chemical space being explored by the drug discovery teams. To that end, ADME data sets from Genentech and Roche were combined to evaluate the impact of expanding the chemical space on the performance of ML models, a first experiment of its kind for large-scale, historical ADME data sets. The combined ADME data set consisted of over 1 million individual measurements distributed across 11 assay end points. We utilized a multitask (MT) neural network architecture that enables the modeling of multiple end points simultaneously and thereby exploits information transfer between interconnected ADME end points. Both single- and cross-site MT models were trained and compared against single-site, single-task baseline models. Given the differences in assay protocols across the two sites, the data for corresponding end points across sites were modeled as separate tasks. Models were evaluated against test sets representing varying degrees of extrapolation difficulty, including cluster-based, temporal, and external test sets. We found that cross-site MT models appeared to provide a greater generalization capacity compared to single-site models. The performance improvement of the cross-site MT models was more pronounced for the relatively "distant" external and temporal test sets, suggesting an expanded applicability domain. The data exchange exercise described here demonstrates the value of expanding the learning from ADME data from multiple sources without the need to aggregate such data when the experimental methods are disparate.

Authors

  • Joseph A Napoli
    Drug Metabolism & Pharmacokinetics (DMPK), Genentech, Inc., 1 DNA Way, South San Francisco, California 94080, United States.
  • Michael Reutlinger
    F. Hoffmann-La Roche Ltd., pRED, Pharma Research & Early Development, Roche Innovation Center Basel, Grenzacherstrasse 124, 4070, Basel, Switzerland.
  • Patricia Brandl
    F. Hoffmann-La Roche Ltd., Pharma Research & Early Development (pRED), Roche Innovation Center Basel, Grenzacherstrasse 124, 4070 Basel, Switzerland.
  • Wenyi Wang
    Department of Bioinformatics and Computational Biology, The University of Texas MD Anderson Cancer Center, Houston, 77030, TX, USA. wwang7@mdanderson.org.
  • Jérôme Hert
    F. Hoffmann-La Roche Ltd., Pharma Research & Early Development (pRED), Roche Innovation Center Basel, Grenzacherstrasse 124, 4070 Basel, Switzerland.
  • Prashant Desai
    Drug Metabolism & Pharmacokinetics (DMPK), Genentech, Inc., 1 DNA Way, South San Francisco, California 94080, United States.