Bimolecular Nucleophilic Substitution Reactions: Predictive Models for Rate Constants and Molecular Reaction Pairs Analysis.

Journal: Molecular informatics
Published Date:

Abstract

Here, we report the data visualization, analysis and modeling for a large set of 4830 S 2 reactions the rate constant of which (logk) was measured at different experimental conditions (solvent, temperature). The reactions were encoded by one single molecular graph - Condensed Graph of Reactions, which allowed us to use conventional chemoinformatics techniques developed for individual molecules. Thus, Matched Reaction Pairs approach was suggested and used for the analyses of substituents effects on the substrates and nucleophiles reactivity. The data were visualized with the help of the Generative Topographic Mapping approach. Consensus Support Vector Regression (SVR) model for the rate constant was prepared. Unbiased estimation of the model's performance was made in cross-validation on reactions measured on unique structural transformations. The model's performance in cross-validation (RMSE=0.61 logk units) and on the external test set (RMSE=0.80) is close to the noise in data. Performances of the local models obtained for selected subsets of reactions proceeding in particular solvents or with particular type of nucleophiles were similar to that of the model built on the entire set. Finally, four different definitions of model's applicability domains for reactions were examined.

Authors

  • Timur Gimadiev
    Laboratory of Chemoinformatics and Molecular Modeling, Butlerov Institute of Chemistry, Kazan Federal University, Kremlyovskaya str. 18, Kazan, Russia.
  • Timur Madzhidov
    Laboratory of Chemoinformatics and Molecular Modeling, Butlerov Institute of Chemistry, Kazan Federal University, Kremlyovskaya str. 18, Kazan, Russia.
  • Igor Tetko
    Helmholtz Zentrum München - German Research Center for Environmental Health (GmbH), Institute of Structural Biology, Ingolstädter Landstraße 1, D-85764, Neuherberg, Germany.
  • Ramil Nugmanov
    Laboratory of Chemoinformatics and Molecular Modeling, Butlerov Institute of Chemistry, Kazan Federal University, Kremlyovskaya str. 18, Kazan, Russia.
  • Iury Casciuc
    Laboratoire de Chémoinformatique, UMR 7140 CNRS, Université de Strasbourg, 1, rue Blaise Pascal, 67000, Strasbourg, France.
  • Olga Klimchuk
    Laboratoire de Chémoinformatique, UMR 7140 CNRS, Université de Strasbourg, 1, rue Blaise Pascal, 67000, Strasbourg, France.
  • Andrey Bodrov
    Laboratory of Chemoinformatics and Molecular Modeling, Butlerov Institute of Chemistry, Kazan Federal University, Kremlyovskaya str. 18, Kazan, Russia.
  • Pavel Polishchuk
    Institute of Molecular and Translational Medicine, Faculty of Medicine and Dentistry, Palacky University, Hněvotínská 1333/5, 77900, Olomouc, Czech Republic.
  • Igor Antipin
    Laboratory of Chemoinformatics and Molecular Modeling, Butlerov Institute of Chemistry, Kazan Federal University, Kremlyovskaya str. 18, Kazan, Russia.
  • Alexandre Varnek
    Laboratoire de Chemoinformatique, UMR 7140, Université de Strasbourg , 1 rue Blaise Pascal, Strasbourg 67000, France.