Optimizing Vanadium-Catalyzed Epoxidation Reactions: Machine-Learning-Driven Yield Predictions and Data Augmentation.

Journal: Journal of chemical information and modeling
Published Date:

Abstract

Catalytic epoxidations are key chemical processes serving as essential steps in the synthesis of commercially valuable compounds. This study presents an innovative supervised machine learning (ML) model to predict the reaction yield of the vanadium-catalyzed epoxidation of small alcohols and alkenes. Our framework uncovers relevant chemical characteristics for structure design, offering a pathway for automated optimization of epoxidation reactions. The study also incorporates the concept of data augmentation, handling experimental variability by generating synthetic reactions to densify under-represented data segments. Trained on a curated data set of 273 experimental epoxidation reactions with vanadyl catalyst groups, the model achieved a predictive test score of 90%, with a mean absolute yield prediction error of 4.7%. The ML model offers a high degree of explainability, as descriptor analysis identified key experimental and chemical descriptors that influence catalytic reaction predictions. This represents a significant development in catalytic epoxidation studies, highlighting the critical role of data science in reaction research and catalyst optimization.

Authors

  • José Ferraz-Caetano
    Department of Chemistry and Biochemistry - Faculty of Sciences, University of Porto - Rua do Campo Alegre, S/N, 4169-007 Porto, Portugal.
  • Filipe Teixeira
    Centre of Chemistry, University of Minho, Campus de Gualtar, 4710-057 Braga, Portugal.
  • M Natália D S Cordeiro