Topology-Based and Conformation-Based Decoys Database: An Unbiased Online Database for Training and Benchmarking Machine-Learning Scoring Functions.

Journal: Journal of medicinal chemistry
Published Date:

Abstract

Machine-learning-based scoring functions (MLSFs) have gained attention for their potential to improve accuracy in binding affinity prediction and structure-based virtual screening (SBVS) compared to classical SFs. Developing accurate MLSFs for SBVS requires a large and unbiased dataset that includes structurally diverse actives and decoys. Unfortunately, most datasets suffer from hidden biases and data insufficiency. Here, we developed topology-based and conformation-based decoys database (ToCoDDB). The biological targets and active ligands in ToCoDDB were collected from scientific literature and established datasets. The decoys were generated and debiased by using conditional recurrent neural networks and molecular docking. ToCoDDB is presently the largest unbiased database with 2.4 million decoys encompassing 155 targets. The detailed information and performance benchmark for each target are provided, which are beneficial for training and evaluating MLSFs. Moreover, the online decoys generation function of ToCoDDB further expands its application range to any target. ToCoDDB is freely available at http://cadd.zju.edu.cn/tocodecoy/.

Authors

  • Xujun Zhang
    Injury Prevention Research Institute, Department of Epidemiology and Biostatistics, School of Public Health, Southeast University, Nanjing, Jiangsu Province, China.
  • Chao Shen
    Department of Epidemiology, School of Public Health, Soochow University, Suzhou 215123, China.
  • Tianyue Wang
    Key Laboratory of Drinking Water Science and Technology, Research Center for Eco-Environmental Sciences, Chinese Academy of Sciences, Beijing 100085, China; School of Chemical and Environmental Engineering, Beijing Campus, China University of Mining and Technology, Beijing 100083, China.
  • Yu Kang
    College of Pharmaceutical Sciences, Zhejiang University , Hangzhou, Zhejiang 310058, P. R. China.
  • Dan Li
    State Key Laboratory of Biotherapy and Cancer Center, West China Hospital, Sichuan University and Collaborative Innovation Center, Chengdu, Sichuan 610041, PR China.
  • Peichen Pan
    Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences and Cancer Center, Zhejiang University, Hangzhou 310058, Zhejiang, China. Electronic address: panpeichen@zju.edu.cn.
  • Jike Wang
    School of Computer Science, Wuhan University, Wuhan, Hubei 430072, China.
  • Gaoang Wang
  • Yafeng Deng
    Hangzhou Carbonsilicon AI Technology Co., Ltd, Hangzhou 310018, Zhejiang, China.
  • Lei Xu
    Key Laboratory of Biomedical Information Engineering of the Ministry of Education, Department of Biomedical Engineering, School of Life Science and Technology, Xi'an Jiaotong University, Xi'an, China.
  • Dongsheng Cao
    School of Pharmaceutical Sciences, Central South University, Changsha, China. oriental-cds@163.com.
  • Tingjun Hou
    College of Pharmaceutical Sciences, Zhejiang University , Hangzhou, Zhejiang 310058, China.
  • Zhe Wang
    Department of Pathology, The Eighth Affiliated Hospital, Sun Yat-sen University, Shenzhen 518033, China.