Application of Bioactivity Profile-Based Fingerprints for Building Machine Learning Models.

Journal: Journal of chemical information and modeling
Published Date:

Abstract

The volume of high throughput screening data has considerably increased since the beginning of the automated biochemical and cell-based assays era. This information-rich data source provides tremendous repurposing opportunities for data mining. It was recently shown that biochemical or cell-based assay results can be compiled into so-called high-throughput fingerprints (HTSFPs) as a new type of descriptor describing molecular bioactivity profiles which can be applied in virtual screening, iterative screening, and target deconvolution. However, so far, studies around HTSFPs and machine learning have mainly focused on predicting the outcome of molecules in single high-throughput assays, and no one has reported the modeling of compounds' biochemical assay activities toward a panel of target proteins. In this article, we aim at comparing how our in-house HTSFPs perform at this when combined with multitask deep learning versus the single task support vector machine method both in terms of hit identification and of scaffold hopping potential. Performances obtained from the two HTSFP models were reported with respect to the performances of multitask deep learning and support vector machine models built with the structural descriptors ECFP. Moreover, we investigated the effect of high throughput screening false positives and negatives on the performance of the generated models. Our results showed that the two fingerprints yielded in similar performances and diverse hits with very little overlap, thus demonstrating the orthogonality of bioactivity profile-based descriptors with structural descriptors. Therefore, modeling compound activity data using ECFPs together with HTSFPs increases the scaffold hopping potential of the predictive models.

Authors

  • Noé Sturm
    Hit Discovery, Discovery Sciences, IMED Biotech Unit , AstraZeneca , Pepparedsleden 1 , 43153 Mölndal , Sweden.
  • Jiangming Sun
    Hit Discovery, Discovery Sciences, IMED Biotech Unit , AstraZeneca , Pepparedsleden 1 , 43153 Mölndal , Sweden.
  • Yves Vandriessche
    Intel Corporation, Data Center Group , Veldkant 31 , 2550 Kontich , Belgium.
  • Andreas Mayr
    Department of Medical Biometry, Informatics and Epidemiology, Faculty of Medicine, University of Bonn, Bonn, Germany.
  • Günter Klambauer
    ELLIS Unit Linz and LIT AI Lab, Institute for Machine Learning, Johannes Kepler University Linz, A-4040 Linz, Austria.
  • Lars Carlsson
    Quantitative Biology, Discovery Sciences, IMED Biotech Unit , AstraZeneca , SE-43183 , Mölndal , Sweden.
  • Ola Engkvist
    Hit Discovery, Discovery Sciences, Innovative Medicines and Early Development Biotech Unit, AstraZeneca R&D Gothenburg, 431 83, Mölndal, Sweden.
  • Hongming Chen
    Hit Discovery, Discovery Sciences, Innovative Medicines and Early Development Biotech Unit, AstraZeneca R&D Gothenburg, 431 83, Mölndal, Sweden.