MG-BERT: leveraging unsupervised atomic representation learning for molecular property prediction.

Journal: Briefings in bioinformatics
Published Date:

Abstract

MOTIVATION: Accurate and efficient prediction of molecular properties is one of the fundamental issues in drug design and discovery pipelines. Traditional feature engineering-based approaches require extensive expertise in the feature design and selection process. With the development of artificial intelligence (AI) technologies, data-driven methods exhibit unparalleled advantages over the feature engineering-based methods in various domains. Nevertheless, when applied to molecular property prediction, AI models usually suffer from the scarcity of labeled data and show poor generalization ability.

Authors

  • Xiao-Chen Zhang
    The College of Computer, National University of Defense Technology, China.
  • Cheng-Kun Wu
    State Key Laboratory of High-Performance Computing, College of Computer, National University of Defense Technology, China.
  • Zhi-Jiang Yang
    Xiangya School of Pharmaceutical Sciences , Central South University , Changsha 410013 , Hunan , P. R. China.
  • Zhen-Xing Wu
    College of Pharmaceutical Sciences, Zhengjiang University, China.
  • Jia-Cai Yi
    State Key Laboratory of High-Performance Computing, School of Computer Science, National University of Defense Technology, China.
  • Chang-Yu Hsieh
    Tencent Quantum Laboratory, Tencent, Shenzhen 518057 Guangdong, P. R. China.
  • Ting-Jun Hou
    Hangzhou Institute of Innovative Medicine, College of Pharmaceutical Sciences , Zhejiang University , Hangzhou 310058 , Zhejiang , P. R. China.
  • Dong-Sheng Cao
    Xiangya School of Pharmaceutical Sciences , Central South University , Changsha 410013 , Hunan , P. R. China.