Machine Learning to Predict Homolytic Dissociation Energies of C-H Bonds: Calibration of DFT-based Models with Experimental Data.

Journal: Molecular informatics
Published Date:

Abstract

Random Forest (RF) QSPR models were developed with a data set of homolytic bond dissociation energies (BDE) previously calculated by B3LYP/6-311++G(d,p)//DFTB for 2263 sp3C-H covalent bonds. The best set of attributes consisted in 114 descriptors of the carbon atom (counts of atom types in 5 spheres around the kernel atom and ring descriptors). The optimized model predicted the DFT-calculated BDE of an independent test set of 224 bonds with MAE=2.86 kcal/mol. A new data set of 409 bonds from the iBonD database (http://ibond.nankai.edu.cn) was predicted by the RF with a modest MAE (5.36 kcal/mol) but a relatively high R (0.75) against experimental energies. A prediction scheme was explored that corrects the RF prediction with the average deviation observed for the k nearest neighbours (KNN) in an additional memory of experimental data. The corrected predictions achieved MAE=2.22 kcal/mol for an independent test set of 145 bonds and the corresponding experimental bond energies.

Authors

  • Wanli Li
    Henan Engineering Research Center of Industrial Circulating Water Treatment, Henan Joint International Research Laboratory of Environmental Pollution Control Materials, Henan University, Kaifeng, 475004, P.R. China.
  • Yue Luan
    Henan Engineering Research Center of Industrial Circulating Water Treatment, Henan Joint International Research Laboratory of Environmental Pollution Control Materials, Henan University, Kaifeng, 475004, P.R. China.
  • Qingyou Zhang
    Institute of Environmental and Analytical Sciences, College of Chemistry and Chemical Engineering, Henan University, Kaifeng, 475004, PR China.
  • João Aires-de-Sousa
    LAQV-REQUIMTE, Departamento de Química, Faculdade de Ciências e Tecnologia, Universidade NOVA de Lisboa, 2829-516 Caparica, Portugal phone/fax: +351 21 2948300. joao@airesdesousa.com.