ElectroPredictor: An Application to Predict Mayr's Electrophilicity through Implementation of an Ensemble Model Based on Machine Learning Algorithms.

Journal: Journal of chemical information and modeling
Published Date:

Abstract

Electrophilicity () is one of the most important parameters to understand the reactivity of an organic molecule. Although the theoretical electrophilicity index (ω) has been associated with in a small homologous series, the use of to predict in a structurally heterogeneous set of compounds is not a trivial task. In this study, a robust ensemble model is created using Mayr's database of reactivity parameters. A combination of topological and quantum mechanical descriptors and different machine learning algorithms are employed for the model's development. The predictability of the model is assessed using different statistical parameters, and its validation is examined, including a training/test partition, an applicability domain, and a -scrambling test. The global ensemble model presents a of 0.909 and a of 0.912, demonstrating an excellent predictability performance of values and showing that is not a good descriptor for the prediction of , especially for the case of neutral compounds. , a noncommercial Python application (https://github.com/mmoreno1/ElectroPredictor), is developed to predict . QM9, a well-known large dataset containing 133885 neutral molecules, is used to perform a virtual screening (94.0% coverage). Finally, the 10 most electrophilic molecules are analyzed as possible new Mayr's electrophiles, which have not yet been experimentally tested. This study confirms the necessity to build an ensemble model using nonlinear machine learning algorithms, topographic descriptors, and separating molecules into charged and neutral compounds to predict with precision.

Authors

  • Sebastián A Cuesta
    Instituto de Simulación Computacional (ISC-USFQ), Departamento de Ingeniería Química, Universidad San Francisco de Quito, Diego de Robles y Vía Interoceánica, Quito170901, Ecuador.
  • Martín Moreno
    Instituto de Simulación Computacional (ISC-USFQ), Departamento de Ingeniería Química, Universidad San Francisco de Quito, Diego de Robles y Vía Interoceánica, Quito170901, Ecuador.
  • Romina A López
    Colegio San Ignacio de Loyola─Fe y Alegría, Ministerio de Educación, Quito170901, Ecuador.
  • José R Mora
    Instituto de Simulación Computacional (ISC-USFQ), Departamento de Ingeniería Química, Universidad San Francisco de Quito, Diego de Robles y Vía Interoceánica, Quito170901, Ecuador.
  • José Luis Paz
    Departamento Académico de Química Inorgánica, Facultad de Química e Ingeniería Química, Universidad Nacional Mayor de San Marcos, Cercado de Lima, Lima15081, Peru.
  • Edgar A Márquez
    Grupo de Investigaciones en Química y Biología, Departamento de Química y Biología, Facultad de Ciencias Exactas, Universidad del Norte, Carrera 51B, Km 5, vía Puerto Colombia, Barranquilla081007, Colombia.