Machine Learning Estimation of Atom Condensed Fukui Functions.

Journal: Molecular informatics
Published Date:

Abstract

To enable the fast estimation of atom condensed Fukui functions, machine learning algorithms were trained with databases of DFT pre-calculated values for ca. 23,000 atoms in organic molecules. The problem was approached as the ranking of atom types with the Bradley-Terry (BT) model, and as the regression of the Fukui function. Random Forests (RF) were trained to predict the condensed Fukui function, to rank atoms in a molecule, and to classify atoms as high/low Fukui function. Atomic descriptors were based on counts of atom types in spheres around the kernel atom. The BT coefficients assigned to atom types enabled the identification (93-94 % accuracy) of the atom with the highest Fukui function in pairs of atoms in the same molecule with differences ≥0.1. In whole molecules, the atom with the top Fukui function could be recognized in ca. 50 % of the cases and, on the average, about 3 of the top 4 atoms could be recognized in a shortlist of 4. Regression RF yielded predictions for test sets with R(2) =0.68-0.69, improving the ability of BT coefficients to rank atoms in a molecule. Atom classification (as high/low Fukui function) was obtained with RF with sensitivity of 55-61 % and specificity of 94-95 %.

Authors

  • Qingyou Zhang
    Institute of Environmental and Analytical Sciences, College of Chemistry and Chemical Engineering, Henan University, Kaifeng, 475004, PR China.
  • Fangfang Zheng
    Institute of Environmental and Analytical Sciences, College of Chemistry and Chemical Engineering, Henan University, Kaifeng, 475004, PR China.
  • Tanfeng Zhao
    Institute of Environmental and Analytical Sciences, College of Chemistry and Chemical Engineering, Henan University, Kaifeng, 475004, PR China.
  • Xiaohui Qu
    Environmental Energy Technology Division, Lawrence Berkeley National Laboratory, Berkeley, USA.
  • João Aires-de-Sousa
    LAQV-REQUIMTE, Departamento de Química, Faculdade de Ciências e Tecnologia, Universidade NOVA de Lisboa, 2829-516 Caparica, Portugal phone/fax: +351 21 2948300. joao@airesdesousa.com.