Self-Supervised Contrastive Molecular Representation Learning with a Chemical Synthesis Knowledge Graph.

Journal: Journal of chemical information and modeling
Published Date:

Abstract

Self-supervised molecular representation learning has demonstrated great promise in bridging machine learning and chemical science to accelerate the development of new drugs. Due to the limited reaction data, existing methods are mostly pretrained by augmenting the intrinsic topology of molecules without effectively incorporating chemical reaction prior information, which makes them difficult to generalize to chemical reaction-related tasks. To address this issue, we propose ReaKE, a reaction knowledge embedding framework, which formulates chemical reactions as a knowledge graph. Specifically, we constructed a chemical synthesis knowledge graph with reactants and products as nodes and reaction rules as the edges. Based on the knowledge graph, we further proposed novel contrastive learning at both molecule and reaction levels to capture the reaction-related functional group information within and between molecules. Extensive experiments demonstrate the effectiveness of ReaKE compared with state-of-the-art methods on several downstream tasks, including reaction classification, product prediction, and yield prediction.

Authors

  • Jiancong Xie
    School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou, 510000, China.
  • Yi Wang
    Department of Neurology, Children's Hospital of Fudan University, National Children's Medical Center, Shanghai, China.
  • Jiahua Rao
    School of Data and Computer Science , Sun Yat-sen University , Guangzhou 510006 , China.
  • Shuangjia Zheng
    Research Center for Drug Discovery, School of Pharmaceutical Sciences , Sun Yat-sen University , 132 East Circle at University City , Guangzhou 510006 , China.
  • Yuedong Yang
    Institute for Glycomics and School of Information and Communication Technique, Griffith University, Parklands Dr. Southport, QLD 4222, Australia.