SAr Regioselectivity Predictions: Machine Learning Triggering DFT Reaction Modeling through Statistical Threshold.

Journal: Journal of chemical information and modeling
Published Date:

Abstract

Fast and accurate prospective predictions of regioselectivity can significantly reduce the time and resources spent on unproductive transformations in the pharmaceutical industry. Density functional theory (DFT) reaction modeling through transition state theory (TST) and machine learning (ML) methods has been widely used to predict reaction outcomes such as selectivity. However, TST reaction modeling and ML methods are either time-consuming or data-dependent. Herein, we introduce a prototype seamlessly bridging ML and TST modeling by triggering resource-intensive but much less domain-sensitive DFT calculations only on less confident ML predictions. The proposed workflow was trained and tested on both the Pfizer internal dataset and the USPTO public dataset to predict regioselectivity for SAr reactions. Our method is accurate and fast, which achieves 96.3 and 94.7% accuracy in predicting the correct major product on Pfizer and USPTO datasets, respectively, in a fraction of conventional TST computing time.

Authors

  • Yanfei Guan
    Department of Chemical Engineering, Massachusetts Institute of Technology 77 Massachusetts Avenue Cambridge MA 02139 USA whgreen@mit.edu kfjensen@mit.edu.
  • Taegyo Lee
    Chemical Research and Development, Groton Laboratories, Pfizer Worldwide Research and Development, Groton, Connecticut 06340, United States.
  • Ke Wang
    China Electric Power Research Institute, Haidian District, Beijing 100192, China. wangke1@epri.sgcc.com.cn.
  • Shu Yu
    Chemical Research and Development, Groton Laboratories, Pfizer Worldwide Research and Development, Groton, Connecticut 06340, United States.
  • J Christopher McWilliams
    Chemical Research and Development, Groton Laboratories, Pfizer Worldwide Research and Development, Groton, Connecticut 06340, United States.