Widespread false negatives in DNA-encoded library data: how linker effects impair machine learning-based lead prediction.

Journal: Chemical science
Published Date:

Abstract

DNA-encoded chemical libraries (DECLs) have become integral to early-stage drug discovery, yielding active compounds and extensive labeled datasets for machine learning (ML)-based prediction of bioactive molecules. However, the information content of DECL selection data remains scarcely explored. This study systematically investigates for the first time the prevalence of false negatives and the influence of the linker in DECL data. Using a focused DECL targeting the poly-(ADP-ribose) polymerases PARP1/2 and TNKS1/2 as a model system, we found that our DECL selections frequently miss active compounds, with numerous false negatives for each identified hit. The presence of the DNA-conjugation linker emerged as a factor contributing to the underdetection of active molecules. This bias toward false negatives compromises the predictive power of DECL data for prioritizing hits, anticipating target selectivity, and training ML models, as determined by analyzing the effects of undersampling and oversampling techniques in learning the PARP2 data. Conversely, the linker's presence in DECLs offers advantages, such as enabling the identification of target-selective protein engagers, even when the underlying molecules themselves may not be selective. These findings highlight the challenges and opportunities of DECL data, emphasizing the need for best practices in data handling and ML model development in drug discovery.

Authors

  • Alba L Montoya
    Department of Medicinal Chemistry, College of Pharmacy, University of Utah 30 S 2000 E Salt Lake City UT 84112 USA raphael.franzini@utah.edu.
  • Adam S Hogendorf
    Department of Medicinal Chemistry, College of Pharmacy, University of Utah 30 S 2000 E Salt Lake City UT 84112 USA raphael.franzini@utah.edu.
  • Steven Tingey
    Waterford School 1480 E 9400 S Sandy UT 84093 USA.
  • Aadarsh Kuberan
    West High School 241 N 300 W Salt Lake City UT 84103 USA.
  • Lik Hang Yuen
    Department of Medicinal Chemistry, College of Pharmacy, University of Utah 30 S 2000 E Salt Lake City UT 84112 USA raphael.franzini@utah.edu.
  • Herwig Schüler
    Center for Molecular Protein Science, Department of Chemistry, Lund University Lund 22100 Sweden.
  • Raphael M Franzini
    Department of Medicinal Chemistry, University of Utah, Salt Lake City, Utah 84112, United States.

Keywords

No keywords available for this article.