MIFS: An adaptive multipath information fused self-supervised framework for drug discovery.

Journal: Neural networks : the official journal of the International Neural Network Society
Published Date:

Abstract

The production of expressive molecular representations with scarce labeled data is challenging for AI-driven drug discovery. Mainstream studies often follow a pipeline that pre-trains a specific molecular encoder and then fine-tunes it. However, the significant challenges of these methods are (1) neglecting the propagation of diverse information within molecules and (2) the absence of knowledge and chemical constraints in the pre-training strategy. In this study, we propose an adaptive multipath information fused self-supervised framework (MIFS) that explores molecular representations from large-scale unlabeled data to aid drug discovery. In MIFS, we innovatively design a dedicated molecular graph encoder called Mol-EN, which implements three pathways of information propagation: atom-to-atom, chemical bond-to-atom, and group-to-atom, to comprehensively perceive and capture abundant semantic information. Furthermore, a novel adaptive pre-training strategy based on molecular scaffolds is devised to pre-train Mol-EN on 11 million unlabeled molecules. It optimizes Mol-EN by constructing a topological contrastive loss to provide additional chemical insights into molecular structures. Subsequently, the pre-trained Mol-EN is fine-tuned on 14 widespread drug discovery benchmark datasets, including molecular properties prediction, drug-target interactions, and drug-drug interactions. Notably, to further enhance chemical knowledge, we introduce an elemental knowledge graph (ElementKG) in the fine-tuning phase. Extensive experiments show that MIFS achieves competitive performance while providing plausible explanations for predictions from a chemical perspective.

Authors

  • Xu Gong
    Department of Orthodontics, Peking University School and Hospital of Stomatology, 22 Zhongguancun South Avenue, Haidian District, Beijing, P.R. China.
  • Qun Liu
    Department of Burn and Plastic Surgery, the Fourth Hospital of Tianjin, Tianjin 300222, China; Email: 1502831499@qq.com.
  • Rui Han
    China Environment Publishing Group, Beijing, 100062, People's Republic of China.
  • Yike Guo
    Department of Computing, Imperial College, London SW7 2AZ, UK. y.guo@imperial.ac.uk.
  • Guoyin Wang