Bayesian Algorithm for Retrosynthesis.

Journal: Journal of chemical information and modeling
Published Date:

Abstract

The identification of synthetic routes that end with the desired product is considered an inherently time-consuming process that is largely dependent on expert knowledge regarding a limited proportion of the entire reaction space. At present, emerging machine learning technologies are reformulating the process of retrosynthetic planning. This study aimed to discover synthetic routes backwardly from a given desired molecule to commercially available compounds. The problem is reduced to a combinatorial optimization task with the solution space subject to the combinatorial complexity of all possible pairs of purchasable reactants. We address this issue within the framework of Bayesian inference and computation. The workflow consists of the training of a deep neural network, which is used to forwardly predict a product of the given reactants with a high level of accuracy, followed by inversion of the forward model into the backward one via Bayes' law of conditional probability. Using the backward model, a diverse set of highly probable reaction sequences ending with a given synthetic target is exhaustively explored using a Monte Carlo search algorithm. With a forward model prediction accuracy of approximately 87%, the Bayesian retrosynthesis algorithm successfully rediscovered 81.8 and 33.3% of known synthetic routes of one-step and two-step reactions, respectively, with top-10 accuracy. Remarkably, the Monte Carlo algorithm, which was specifically designed for the presence of multiple diverse routes, often revealed a ranked list of hundreds of reaction routes to the same synthetic target. We also investigated the potential applicability of such diverse candidates based on expert knowledge of synthetic organic chemistry.

Authors

  • Zhongliang Guo
    The Institute of Statistical Mathematics, Research Organization of Information and Systems, Tachikawa, Tokyo 190-8562, Japan.
  • Stephen Wu
    School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX, USA.
  • Mitsuru Ohno
    Daicel Corporation, Kita-ku, Osaka 530-0011, Japan.
  • Ryo Yoshida
    The Graduate University for Advanced Studies (SOKENDAI), Tachikawa, Japan. yoshidar@ism.ac.jp.