Facing small and biased data dilemma in drug discovery with enhanced federated learning approaches.

Journal: Science China. Life sciences
PMID:

Abstract

Artificial intelligence (AI) models usually require large amounts of high-quality training data, which is in striking contrast to the situation of small and biased data faced by current drug discovery pipelines. The concept of federated learning has been proposed to utilize distributed data from different sources without leaking sensitive information of the data. This emerging decentralized machine learning paradigm is expected to dramatically improve the success rate of AI-powered drug discovery. Here, we simulated the federated learning process with different property and activity datasets from different sources, among which overlapping molecules with high or low biases exist in the recorded values. Beyond the benefit of gaining more data, we also demonstrated that federated training has a regularization effect superior to centralized training on the pooled datasets with high biases. Moreover, different network architectures for clients and aggregation algorithms for coordinators have been compared on the performance of federated learning, where personalized federated learning shows promising results. Our work demonstrates the applicability of federated learning in predicting drug-related properties and highlights its promising role in addressing the small and biased data dilemma in drug discovery.

Authors

  • Zhaoping Xiong
    Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai, 201203, China.
  • Ziqiang Cheng
    Shanghai Institute for Advanced Immunochemical Studies, and School of Life Science and Technology, Shanghai Tech University, Shanghai, 200031, China.
  • Xinyuan Lin
    Laboratory of Health Intelligence, Huawei Technologies Co., Ltd, Shenzhen, 518100, China.
  • Chi Xu
    Hamlyn Centre of Robotic Surgery, Department of Surgery and Cancer Imperial College London London UK.
  • Xiaohong Liu
    Department of Biopharmaceutics, School of Pharmacy, Shenyang Pharmaceutical University, Wenhua Road, Shenyang 110016, China. Electronic address: lvj221@163.com.
  • Dingyan Wang
    Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai 201203, China.
  • Xiaomin Luo
    Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai 201203, China.
  • Yong Zhang
    Outpatient Department of Hepatitis, The Sixth Affiliated People's Hospital of Dalian Medical University, Dalian, Liaoning, China.
  • Hualiang Jiang
    Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai 201203, China ; School of Pharmacy, East China University of Science and Technology, Shanghai 200237, China.
  • Nan Qiao
    Laboratory of Health Intelligence, Huawei Technologies Co., Ltd, Shenzhen, 518100, China. qiaonan3@huawei.com.
  • Mingyue Zheng
    School of Pharmaceutical Science and Technology, Hangzhou Institute for Advanced Study, UCAS, Hangzhou, Zhejiang Province, China.