Single-Step Retrosynthesis Prediction Based on the Identification of Potential Disconnection Sites Using Molecular Substructure Fingerprints.

Journal: Journal of chemical information and modeling
Published Date:

Abstract

The proper application of retrosynthesis to identify possible transformations for a given target compound requires a lot of chemistry knowledge and experience. However, because the complexity of this technique scales together with the complexity of the target, efficient application on compounds with intricate molecular structures becomes almost impossible for human chemists. The idea of using computers in such situations has existed for a long time, but the accuracy was not sufficient for practical applications. Nevertheless, with the steady improvement of machine learning and artificial intelligence in recent years, computer-assisted retrosynthesis has been gaining research attention again. Because of the overall lack of chemical reaction data, the main challenge for the recent retrosynthesis methods is low exploration ability during the analysis of target and intermediate compounds. The main goal of this research is to develop a novel, template-free approach to address this issue. Only individual molecular substructures of the target are used to determine potential disconnection sites, without relying on additional information such as chemical reaction class. The model for the identification of potential disconnection sites is trained on novel molecular substructure fingerprint representations. For each of the disconnections suggested using the model, a simple structural similarity-based reactant retrieval and scoring method is applied, and the suggestions are completed. This method achieves 47.2% top-1 accuracy for the single-step retrosynthesis task on the processed United States Patent Office dataset. Furthermore, if the predicted reaction class is used to narrow down the reactant candidate search space, the performance is improved to 61.4% top-1 accuracy.

Authors

  • Haris Hasic
    Department of Computer Science, School of Computing, Tokyo Institute of Technology, W8-85, 2-12-1, Ookayama, Meguro 152-8552, Tokyo, Japan.
  • Takashi Ishida
    Department of Computer Science, Graduate School of Information Science and Engineering, Tokyo Institute of Technology, 2-12-1 Ookayama, Meguro-ku, Tokyo 152-8550, Japan.