The Last Mile Problem: A Critical Assessment of Physics-Based and AI Tools for Small Molecule Binding Prediction in Virtual Screening.

Journal: Journal of chemical information and modeling
Published Date:

Abstract

Docking-based virtual screening (VS) is essential for hit finding in the initial stage of drug or probe discovery. However, it remains prone to high false-positive rates, often resulting in unsuccessful screening campaigns. MD-based alchemical free-energy methods offer a promising solution to improve VS hit rates but are highly resource-intensive. Real-world and benchmark studies incorporating alchemical absolute binding free energy (ABFE) calculations could help optimize their use in VS pipelines. Here, we present a large-scale benchmark to evaluate the comparative value of ABFE calculations in VS workflows. Two data sets were used: a curated set of 632 ligand-protein complexes from the PDBbind database to assess ABFE quantitative accuracy and a set of 315 binders and decoys from the Database of Useful Decoys (DUD-E) to evaluate predictive power in a VS context. Alongside alchemical ABFE, we benchmarked computationally affordable end-state physics-based methods and five machine-learning (ML) models. The study ranked BFE predictors consistently with their computational cost, with alchemical ABFE performing well across both benchmarks. End-state methods scored well in recognizing actives from decoys in the DUD-E data set but showed little correlation with experimental values in PDBbind. Most ML models performed well on PDBbind, likely due to training overlap, but failed on DUD-E, except for GNINA and Boltz-2, which demonstrated a degree of generalization comparable to end-state physics-based methods. Overall, a staged approach involving Boltz-2 as a primary filter followed by alchemical ABFE is likely to robustly and cost-efficiently enrich docking-based VS hit lists with true actives.

Authors

Keywords

No keywords available for this article.