Machine Learning for Toxicity Prediction Using Chemical Structures: Pillars for Success in the Real World.

Journal: Chemical research in toxicology
PMID:

Abstract

Machine learning (ML) is increasingly valuable for predicting molecular properties and toxicity in drug discovery. However, toxicity-related end points have always been challenging to evaluate experimentally with respect to translation due to the required resources for human and animal studies; this has impacted data availability in the field. ML can augment or even potentially replace traditional experimental processes depending on the project phase and specific goals of the prediction. For instance, models can be used to select promising compounds for on-target effects or to deselect those with undesirable characteristics (e.g., off-target or ineffective due to unfavorable pharmacokinetics). However, reliance on ML is not without risks, due to biases stemming from nonrepresentative training data, incompatible choice of algorithm to represent the underlying data, or poor model building and validation approaches. This might lead to inaccurate predictions, misinterpretation of the confidence in ML predictions, and ultimately suboptimal decision-making. Hence, understanding the predictive validity of ML models is of utmost importance to enable faster drug development timelines while improving the quality of decisions. This perspective emphasizes the need to enhance the understanding and application of machine learning models in drug discovery, focusing on well-defined data sets for toxicity prediction based on small molecule structures. We focus on five crucial pillars for success with ML-driven molecular property and toxicity prediction: (1) data set selection, (2) structural representations, (3) model algorithm, (4) model validation, and (5) translation of predictions to decision-making. Understanding these key pillars will foster collaboration and coordination between ML researchers and toxicologists, which will help to advance drug discovery and development.

Authors

  • Srijit Seal
    Broad Institute of MIT and Harvard, Cambridge, Massachusetts 02142, United States.
  • Manas Mahale
    Department of Pharmaceutical Chemistry, Bombay College of Pharmacy, Mumbai 400098, India.
  • Miguel Garcia-Ortegon
    Centre for Molecular Informatics, Department of Chemistry, University of Cambridge, Cambridge, UK.
  • Chaitanya K Joshi
  • Layla Hosseini-Gerami
    Centre for Molecular Informatics, Yusuf Hamied Department of Chemistry, University of Cambridge UK ab454@cam.ac.uk.
  • Alex Beatson
    Axiom Bio, San Francisco, California 94107, United States.
  • Matthew Greenig
    Department of Chemistry, University of Cambridge, Cambridge CB2 1EW, U.K.
  • Mrinal Shekhar
    School of Molecular Sciences, Arizona State University, Tempe, Arizona 85287, United States.
  • Arijit Patra
  • Caroline Weis
    Department of Biosystems Science and Engineering, ETH Zurich, Basel, Switzerland. caroline.weis@bsse.ethz.ch.
  • Arash Mehrjou
    GSK, London WC1A 1DG, U.K.
  • Adrien Badré
    School of Computer Science, University of Oklahoma, Norman, OK, USA.
  • Brianna Paisley
    Eli Lilly & Company, Indianapolis, Indiana 46285, United States.
  • Rhiannon Lowe
    Relation Therapeutics, London NW1 3BG, U.K.
  • Shantanu Singh
    Imaging Platform, Broad Institute of Harvard and MIT, 415 Main Street, Cambridge, MA 02142, USA.
  • Falgun Shah
    Non Clinical Drug Safety, Merck Inc., West Point, Pennsylvania 19486, United States.
  • Bjarki Johannesson
    The New York Stem Cell Foundation Research Institute, New York, NY, USA. johannesson.bjarki@gmail.com.
  • Dominic Williams
    AstraZeneca, Pepparedsleden 1 43183 Molndal, Sweden.
  • David Rouquié
    Bayer SAS, Bayer Crop Science, 355 rue Dostoïevski, CS 90153, 06906, Valbonne, Sophia Antipolis Cedex, France. david.rouquie@bayer.com.
  • Djork-Arné Clevert
    Department of Bioinformatics , Bayer AG , Berlin , Germany . Email: robin.winter@bayer.com.
  • Patrick Schwab
    F Hoffmann-La Roche Ltd, Basel, Switzerland.
  • Nicola Richmond
    Recursion, London N1C 4AG, U.K.
  • Christos A Nicolaou
  • Raymond J Gonzalez
    Non Clinical Drug Safety, Merck Inc., West Point, Pennsylvania 19486, United States.
  • Russell Naven
    Pfizer, Groton, CT, USA.
  • Carolin Schramm
    Sanofi, Babraham Research Campus, Cambridge CB22 3AT, U.K.
  • Lewis R Vidler
    Eli Lilly and Company, Bracknell RG12 1PU, U.K.
  • Kamel Mansouri
    National Center for Computational Toxicology, Office of Research and Development, U.S. Environmental Protection Agency , Research Triangle Park, North Carolina 27711, United States.
  • W Patrick Walters
    Relay Therapeutics, Cambridge, MA, USA.
  • Deidre Dalmas Wilk
    Nonclinical Safety, Collegeville Pennsylvania 19426, United States.
  • Ola Spjuth
    Department of Pharmaceutical Biosciences , Uppsala University , Box 591, SE-75124 , Uppsala Sweden.
  • Anne E Carpenter
    The Broad Institute of MIT and Harvard, 415 Main Street, Cambridge, MA 02142, United States. Electronic address: anne@broadinstitute.org.
  • Andreas Bender
    Centre for Molecular Informatics, Yusuf Hamied Department of Chemistry, University of Cambridge UK ab454@cam.ac.uk.