An Ensemble Structure and Physicochemical (SPOC) Descriptor for Machine-Learning Prediction of Chemical Reaction and Molecular Properties.

Journal: Chemphyschem : a European journal of chemical physics and physical chemistry
Published Date:

Abstract

Feature representations, or descriptors, are machines' chemical language that largely shapes the prediction capability, generalizability and interpretability of machine learning models. To develop a generally applicable descriptor is highly warranted for chemists to deal with conventional prediction tasks in the context of sparsely distributed and small datasets. Inspired by the chemist's vision on molecules, we presented herein an ensemble descriptor, SPOC, curated on the principles of physical organic chemistry that integrates Structure and Physicochemical property (SPOC) of a molecule. SPOC could be readily constructed by combining molecular fingerprints, representing the structure of a given molecule, and molecular physicochemical properties extracted from RDKit or Mordred molecular descriptors. The applicability of SPOC was fully surveyed in a range of well-structured chemical databases with machine learning tasks varying from regression to classifications.

Authors

  • Qi Yang
    Department of Radiology, The First Hospital of Jilin University, No.1, Xinmin Street, Changchun 130021, China (Y.W., M.L., Z.M., J.W., K.H., Q.Y., L.Z., L.M., H.Z.).
  • Yidi Liu
    Center of Basic Molecular Science, Department of Chemistry, Tsinghua University, 100084, Beijing, China.
  • Junjie Cheng
    Center of Basic Molecular Science, Department of Chemistry, Tsinghua University, 100084, Beijing, China.
  • Yao Li
    Center of Robotics and Intelligent Machine, Chongqing Institute of Green and Intelligent Technology, Chinese Academy of Science, No. 266 Fangzhen Road, Beibei District, Chongqing, 400714, China.
  • Siyuan Liu
    Key laboratory of Transplantation, Chinese Academy of Medical Sciences, Tianjin, 300192, China; Tianjin Key Laboratory for Organ Transplantation, Tianjin First Center Hospital, Tianjin, 300192, China; Department of Liver Transplantation, Tianjin Medical University First Center Clinical College, Tianjin, 300192, China; Tianjin Key Laboratory of Molecular and Treatment of Liver Cancer, Tianjin First Center Hospital, Tianjin, 300192, China.
  • Yingdong Duan
    Center of Basic Molecular Science, Department of Chemistry, Tsinghua University, 100084, Beijing, China.
  • Long Zhang
    Hefei Institute of Physical Science, Chinese Academy of Sciences Hefei 230036 PR China liuyong@aiofm.ac.cn zhanglong@aiofm.ac.cn wangchongwen1987@126.com.
  • Sanzhong Luo
    Center of Basic Molecular Science, Department of Chemistry, Tsinghua University, 100084, Beijing, China.