Delta Machine Learning to Improve Scoring-Ranking-Screening Performances of Protein-Ligand Scoring Functions.

Journal: Journal of chemical information and modeling
Published Date:

Abstract

Protein-ligand scoring functions are widely used in structure-based drug design for fast evaluation of protein-ligand interactions, and it is of strong interest to develop scoring functions with machine-learning approaches. In this work, by expanding the training set, developing physically meaningful features, employing our recently developed linear empirical scoring function Lin_F9 (Yang, C. 2021, 61, 4630-4644) as the baseline, and applying extreme gradient boosting (XGBoost) with Δ-machine learning, we have further improved the robustness and applicability of machine-learning scoring functions. Besides the top performances for scoring-ranking-screening power tests of the CASF-2016 benchmark, the new scoring function ΔXGB also achieves superior scoring and ranking performances in different structure types that mimic real docking applications. The scoring powers of ΔXGB for locally optimized poses, flexible redocked poses, and ensemble docked poses of the CASF-2016 core set achieve Pearson's correlation coefficient () values of 0.853, 0.839, and 0.813, respectively. In addition, the large-scale docking-based virtual screening test on the LIT-PCBA data set demonstrates the reliability and robustness of ΔXGB in virtual screening application. The ΔXGB scoring function and its code are freely available on the web at (https://yzhang.hpc.nyu.edu/Delta_LinF9_XGB).

Authors

  • Chao Yang
    Translational Institute for Cancer Pain, Chongming Hospital Affiliated to Shanghai University of Health & Medicine Sciences (Xinhua Hospital Chongming Branch), Shanghai 202155, P. R. China.
  • Yingkai Zhang
    Department of Chemistry , New York University , New York , New York 10003 , United States.