An effective self-supervised framework for learning expressive molecular global representations to drug discovery.

Journal: Briefings in bioinformatics
Published Date:

Abstract

How to produce expressive molecular representations is a fundamental challenge in artificial intelligence-driven drug discovery. Graph neural network (GNN) has emerged as a powerful technique for modeling molecular data. However, previous supervised approaches usually suffer from the scarcity of labeled data and poor generalization capability. Here, we propose a novel molecular pre-training graph-based deep learning framework, named MPG, that learns molecular representations from large-scale unlabeled molecules. In MPG, we proposed a powerful GNN for modelling molecular graph named MolGNet, and designed an effective self-supervised strategy for pre-training the model at both the node and graph-level. After pre-training on 11 million unlabeled molecules, we revealed that MolGNet can capture valuable chemical insights to produce interpretable representation. The pre-trained MolGNet can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of drug discovery tasks, including molecular properties prediction, drug-drug interaction and drug-target interaction, on 14 benchmark datasets. The pre-trained MolGNet in MPG has the potential to become an advanced molecular encoder in the drug discovery pipeline.

Authors

  • Pengyong Li
    Department of Biomedical Engineering at Tsinghua University.
  • Jun Wang
    Department of Speech, Language, and Hearing Sciences and the Department of Neurology, The University of Texas at Austin, Austin, TX 78712, USA.
  • Yixuan Qiao
    Operations Research and Cybernetics at Beijing University of Technology, China.
  • Hao Chen
    The First School of Medicine, Wenzhou Medical University, Wenzhou, China.
  • Yihuan Yu
    Beijing University of Biomedical Engineering, China.
  • Xiaojun Yao
    Centre for Artificial Intelligence Driven Drug Discovery, Faculty of Applied Sciences, Macao Polytechnic University, Macao, 999078, PR China.
  • Peng Gao
    Department of Environmental and Occupational Health, University of Pittsburgh, Pittsburgh, PA, United States.
  • Guotong Xie
    Ping An Health Technology, Beijing, China.
  • Sen Song