Classical scoring functions for docking are unable to exploit large volumes of structural and interaction data.

Journal: Bioinformatics (Oxford, England)
Published Date:

Abstract

MOTIVATION: Studies have shown that the accuracy of random forest (RF)-based scoring functions (SFs), such as RF-Score-v3, increases with more training samples, whereas that of classical SFs, such as X-Score, does not. Nevertheless, the impact of the similarity between training and test samples on this matter has not been studied in a systematic manner. It is therefore unclear how these SFs would perform when only trained on protein-ligand complexes that are highly dissimilar or highly similar to the test set. It is also unclear whether SFs based on machine learning algorithms other than RF can also improve accuracy with increasing training set size and to what extent they learn from dissimilar or similar training complexes.

Authors

  • Hongjian Li
    Department of Computer Science and Engineering, Chinese University of Hong Kong, Hong Kong, China.
  • Jiangjun Peng
    Institute of Future Cities, The Chinese University of Hong Kong, Sha Tin, New Territories, Hong Kong, China. andrew.pengjj@gmail.com.
  • Pavel Sidorov
    Cancer Research Center of Marseille CRCM, INSERM, Institut Paoli-Calmettes, Aix-Marseille University, CNRS, F-13009 Marseille, France.
  • Yee Leung
    Institute of Future Cities, Chinese University of Hong Kong, Shatin, Hong Kong.
  • Kwong-Sak Leung
    Department of Computer Science and Engineering, Chinese University of Hong Kong, Hong Kong, China.
  • Man-Hon Wong
    Department of Computer Science and Engineering, Chinese University of Hong Kong, Hong Kong, China.
  • Gang Lu
    Innovation Research Institute of Combined Acupuncture and Medicine, Shaanxi University of CM, Xianyang 712046, China; Shaanxi Key Laboratory of Combined Acupuncture and Medicine, Xianyang 712046.
  • Pedro J Ballester
    Cancer Research Center of Marseille, INSERM U1068, Marseille, France; Institut Paoli-Calmettes, Marseille, France; Aix-Marseille Université, Marseille, France; Cancer Research Center of Marseille UMR7258, Marseille, France.