Discovering Highly Potent Molecules from an Initial Set of Inactives Using Iterative Screening.

Journal: Journal of chemical information and modeling

Published Date: Sep 10, 2018

Abstract

The versatility of similarity searching and quantitative structure-activity relationships to model the activity of compound sets within given bioactivity ranges (i.e., interpolation) is well established. However, their relative performance in the common scenario in early stage drug discovery where lots of inactive data but no active data points are available (i.e., extrapolation from the low-activity to the high-activity range) has not been thoroughly examined yet. To this aim, we have designed an iterative virtual screening strategy which was evaluated on 25 diverse bioactivity data sets from ChEMBL. We benchmark the efficiency of random forest (RF), multiple linear regression, ridge regression, similarity searching, and random selection of compounds to identify a highly active molecule in the test set among a large number of low-potency compounds. We use the number of iterations required to find this active molecule to evaluate the performance of each experimental setup. We show that linear and ridge regression often outperform RF and similarity searching, reducing the number of iterations to find an active compound by a factor of 2 or more. Even simple regression methods seem better able to extrapolate to high-bioactivity ranges than RF, which only provides output values in the range covered by the training set. In addition, examination of the scaffold diversity in the data sets used shows that in some cases similarity searching and RF require two times as many iterations as random selection depending on the chemical space covered in the initial training data. Lastly, we show using bioactivity data for COX-1 and COX-2 that our framework can be extended to multitarget drug discovery, where compounds are selected by concomitantly considering their activity against multiple targets. Overall, this study provides an approach for iterative screening where only inactive data are present in early stages of drug discovery in order to discover highly potent compounds and the best experimental set up in which to do so.

Authors

Isidro Cortes-Ciriano

†Département de Biologie Structurale et Chimie, Institut Pasteur, Unité de Bioinformatique Structurale, CNRS UMR 3825, 25, rue du Dr Roux, 75015 Paris, Ile de France, France.
Nicholas C Firth

Centre for Medical Image Computing, Department of Computer Science , UCL , London WC1E 6BT , United Kingdom.
Andreas Bender

Centre for Molecular Informatics, Yusuf Hamied Department of Chemistry, University of Cambridge UK ab454@cam.ac.uk.
Oliver Watson

Evariste Technologies Ltd , Goring on Thames RG8 9AL , United Kingdom.

Keywords

Algorithms Drug Discovery Drug Evaluation, Preclinical Machine Learning Quantitative Structure-Activity Relationship

External Resources

View on PubMed Access via DOI PubMed (30130102)

Discovering Highly Potent Molecules from an Initial Set of Inactives Using Iterative Screening.

Abstract

Authors

Keywords

External Resources

Popular Topics

Recent Journals