BatmanNet: bi-branch masked graph transformer autoencoder for molecular representation.

Journal: Briefings in bioinformatics
Published Date:

Abstract

Although substantial efforts have been made using graph neural networks (GNNs) for artificial intelligence (AI)-driven drug discovery, effective molecular representation learning remains an open challenge, especially in the case of insufficient labeled molecules. Recent studies suggest that big GNN models pre-trained by self-supervised learning on unlabeled datasets enable better transfer performance in downstream molecular property prediction tasks. However, the approaches in these studies require multiple complex self-supervised tasks and large-scale datasets , which are time-consuming, computationally expensive and difficult to pre-train end-to-end. Here, we design a simple yet effective self-supervised strategy to simultaneously learn local and global information about molecules, and further propose a novel bi-branch masked graph transformer autoencoder (BatmanNet) to learn molecular representations. BatmanNet features two tailored complementary and asymmetric graph autoencoders to reconstruct the missing nodes and edges, respectively, from a masked molecular graph. With this design, BatmanNet can effectively capture the underlying structure and semantic information of molecules, thus improving the performance of molecular representation. BatmanNet achieves state-of-the-art results for multiple drug discovery tasks, including molecular properties prediction, drug-drug interaction and drug-target interaction, on 13 benchmark datasets, demonstrating its great potential and superiority in molecular representation learning.

Authors

  • Zhen Wang
    Department of Otolaryngology, Longgang Otolaryngology hospital & Shenzhen Key Laboratory of Otolaryngology, Shenzhen Institute of Otolaryngology, Shenzhen, Guangdong, China.
  • Zheng Feng
    Intelligent Critical Care Center, University of Florida, Gainesville.
  • Yanjun Li
    NSF Center for Big Learning, University of Florida, Gainesville, FL.
  • Bowen Li
    Department of Pediatric Cardiology, West China Second University Hospital, Sichuan University, Chengdu, China.
  • Yongrui Wang
    Hangzhou Institute of Medicine, Chinese Academy of Sciences, Hangzhou, 310018, Zhejiang, China.
  • Chulin Sha
    Hangzhou Institute of Medicine, Chinese Academy of Sciences, Hangzhou, 310018, Zhejiang, China.
  • Min He
    Department of Endocrinology, Shanghai Medical School, Huashan Hospital, Fudan University, Shanghai, China.
  • Xiaolin Li
    National Science Foundation Center for Big Learning, University of Florida, Gainesville, FL 32611, USA.