SimSon: simple contrastive learning of SMILES for molecular property prediction.

Journal: Bioinformatics (Oxford, England)
Published Date:

Abstract

MOTIVATION: Molecular property prediction with deep learning has accelerated drug discovery and retrosynthesis. However, the shortage of labeled molecular data and the challenge of generalizing across the vast chemical spaces pose significant hurdles for leveraging deep learning in molecular property prediction. This study proposes a self-supervised framework designed to acquire a Simplified Molecular Input Line Entry System (SMILES) representation, which we have dubbed Simple SMILES contrastive learning (SimSon). SimSon was pre-trained using unlabeled SMILES data through contrastive learning to grasp the SMILES representations.

Authors

  • Chae Eun Lee
    Department of Computer Science and Engineering, Seoul National University, 1 Gwanak-ro, Gwanak-gu, Seoul 08826, South Korea.
  • Jin Sob Kim
    Department of Industrial and Management Engineering, Korea University, Seoul 02841, Republic of Korea.
  • Jin Hong Min
    Department of Industrial and Management Engineering, Korea University, Seoul 02841, Republic of Korea.
  • Sung Won Han
    Department of Industrial and Management Engineering, Korea University, Seoul 02841, the Republic of Korea.