Accurate Physical Property Predictions via Deep Learning.

Journal: Molecules (Basel, Switzerland)
Published Date:

Abstract

Neural networks and deep learning have been successfully applied to tackle problems in drug discovery with increasing accuracy over time. There are still many challenges and opportunities to improve molecular property predictions with satisfactory accuracy even further. Here, we proposed a deep-learning architecture model, namely Bidirectional long short-term memory with Channel and Spatial Attention network (BCSA), of which the training process is fully data-driven and end to end. It is based on data augmentation and SMILES tokenization technology without relying on auxiliary knowledge, such as complex spatial structure. In addition, our model takes the advantages of the long- and short-term memory network (LSTM) in sequence processing. The embedded channel and spatial attention modules in turn specifically identify the prime factors in the SMILES sequence for predicting properties. The model was further improved by Bayesian optimization. In this work, we demonstrate that the trained BSCA model is capable of predicting aqueous solubility. Furthermore, our proposed method shows noticeable superiorities and competitiveness in predicting oil-water partition coefficient, when compared with state-of-the-art graphs models, including graph convoluted network (GCN), message-passing neural network (MPNN), and AttentiveFP.

Authors

  • Yuanyuan Hou
    State Key Laboratory of Medicinal Chemical Biology and College of Pharmacy, Tianjin Key Laboratory of Molecular Drug Research, Nankai University, Tianjin 300071, People's Republic of China. Electronic address: houyy@nankai.edu.cn.
  • Shiyu Wang
    Research Center for Computer-Aided Drug Discovery, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China.
  • Bing Bai
    Department of Rehabilitation, the First Affiliated Hospital, School of Medicine, Zhejiang University, Hangzhou, China.
  • H C Stephen Chan
    Research Center for Computer-Aided Drug Discovery, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China; AlphaMol Science Ltd, CH-4123 Allschwil, Switzerland.
  • Shuguang Yuan
    Research Center for Computer-Aided Drug Discovery, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China; AlphaMol Science Ltd, CH-4123 Allschwil, Switzerland; Institute of Chemical Science and Engineering (ISIC), Ecole Polytechnique Fédérale de Lausanne (EPFL), CH-1015 Lausanne, Switzerland. Electronic address: shuguang.yuan@gmail.com.