Chemistry-informed molecular graph as reaction descriptor for machine-learned retrosynthesis planning.

Journal: Proceedings of the National Academy of Sciences of the United States of America
PMID:

Abstract

Infusing "chemical wisdom" should improve the data-driven approaches that rely exclusively on historical synthetic data for automatic retrosynthesis planning. For this purpose, we designed a chemistry-informed molecular graph (CIMG) to describe chemical reactions. A collection of key information that is most relevant to chemical reactions is integrated in CIMG:NMR chemical shifts as vertex features, bond dissociation energies as edge features, and solvent/catalyst information as global features. For any given compound as a target, a product CIMG is generated and exploited by a graph neural network (GNN) model to choose reaction template(s) leading to this product. A reactant CIMG is then inferred and used in two GNN models to select appropriate catalyst and solvent, respectively. Finally, a fourth GNN model compares the two CIMG descriptors to check the plausibility of the proposed reaction. A reaction vector is obtained for every molecule in training these models. The chemical wisdom of reaction propensity contained in the pretrained reaction vectors is exploited to autocategorize molecules/reactions and to accelerate Monte Carlo tree search (MCTS) for multistep retrosynthesis planning. Full synthetic routes with recommended catalysts/solvents are predicted efficiently using this CIMG-based approach.

Authors

  • Baicheng Zhang
    Key Laboratory of Precision and Intelligent Chemistry, School of Chemistry and Materials Science, University of Science and Technology of China, China.
  • Xiaolong Zhang
  • Wenjie Du
    School of Software Engineering, University of Science and Technology of China, Hefei 230026, China; Suzhou Institute for Advanced Research, University of Science and Technology of China, Suzhou 215123, China.
  • Zhaokun Song
    Hefei JiShu Quantum Technology Co. Ltd., Hefei, Anhui 230026, China.
  • Guozhen Zhang
    Hefei National Laboratory for Physical Sciences at the Microscale, CAS Center for Excellence in Nanoscience, School of Chemistry and Materials Science, University of Science and Technology of China, Hefei, Anhui 230026, People's Republic of China.
  • Guoqing Zhang
    Department of Anesthesiology, Zhumadian Central Hospital, Zhumadian, Henan Province, China. Electronic address: hubywk@163.com.
  • Yang Wang
    Department of General Surgery The First People's Hospital of Yunnan Province, The Affiliated Hospital of Kunming University of Science and Technology Kunming China.
  • Xin Chen
    University of Nottingham, Nottingham, United Kingdom.
  • Jun Jiang
    Key Laboratory of Precision and Intelligent Chemistry, School of Chemistry and Materials Science, University of Science and Technology of China, China.
  • Yi Luo
    Electrical and Computer Engineering Department, Bioengineering Department, University of California, Los Angeles, CA 90095 USA, and also with the California NanoSystems Institute, University of California, Los Angeles, CA 90095 USA.