Evaluating and clustering retrosynthesis pathways with learned strategy.

Journal: Chemical science
Published Date:

Abstract

With recent advances in the computer-aided synthesis planning (CASP) powered by data science and machine learning, modern CASP programs can rapidly identify thousands of potential pathways for a given target molecule. However, the lack of a holistic pathway evaluation mechanism makes it challenging to systematically prioritize strategic pathways except for using some simple heuristics. Herein, we introduce a data-driven approach to evaluate the relative strategic levels of retrosynthesis pathways using a dynamic tree-structured long short-term memory (tree-LSTM) model. We first curated a retrosynthesis pathway database, containing 238k patent-extracted pathways along with ∼55 M artificial pathways generated from an open-source CASP program, ASKCOS. The tree-LSTM model was trained to differentiate patent-extracted and artificial pathways with the same target molecule in order to learn the strategic relationship among single-step reactions within the patent-extracted pathways. The model achieved a top-1 ranking accuracy of 79.1% to recognize patent-extracted pathways. In addition, the trained tree-LSTM model learned to encode pathway-level information into a representative latent vector, which can facilitate clustering similar pathways to help illustrate strategically diverse pathways generated from CASP programs.

Authors

  • Yiming Mo
    Department of Chemical Engineering, Massachusetts Institute of Technology Cambridge Massachusetts 02139 USA kfjensen@mit.edu.
  • Yanfei Guan
    Department of Chemical Engineering, Massachusetts Institute of Technology 77 Massachusetts Avenue Cambridge MA 02139 USA whgreen@mit.edu kfjensen@mit.edu.
  • Pritha Verma
    Department of Chemical Engineering, Massachusetts Institute of Technology Cambridge Massachusetts 02139 USA kfjensen@mit.edu.
  • Jiang Guo
    Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology Cambridge Massachusetts 02139 USA.
  • Mike E Fortunato
    Department of Chemical Engineering, Massachusetts Institute of Technology Cambridge Massachusetts 02139 USA kfjensen@mit.edu.
  • Zhaohong Lu
    Department of Chemistry, Massachusetts Institute of Technology Cambridge Massachusetts 02139 USA.
  • Connor W Coley
    Department of Chemical Engineering, Massachusetts Institute of Technology 77 Massachusetts Avenue Cambridge MA 02139 USA whgreen@mit.edu kfjensen@mit.edu.
  • Klavs F Jensen
    Department of Chemical Engineering, Massachusetts Institute of Technology 77 Massachusetts Avenue Cambridge MA 02139 USA whgreen@mit.edu kfjensen@mit.edu.

Keywords

No keywords available for this article.