pKa Predicting Models Trained with a Tautomer-Compatible Graph Dataset with Quantum Chemical Features.

Journal: Journal of chemical information and modeling
Published Date:

Abstract

G-pKa, a database of 6379 experimental pKa values and QM properties from more than 39000 structures, and pKa predicting models are presented. The data include molecular, atomic, and interatomic properties and are organized as graphs, enabling their use in training graph neural networks. Overall, 309 features (99 atomic and 51 interatomic properties) are extracted. The extraction of the data is compatible with the presence of different conformers or tautomeric forms, and indeed, 22% of the data correspond to acid-base equilibria in which more than one tautomer is involved in either the acid, base, or both species. Using the data, two models, one based on an ensemble of trees and another based on graph isomorphic layers, are trained and tested on 4 SAMPL pKa challenge datasets. These models show excellent performance and in 3 of these datasets, improve any of the predicting algorithms in the literature. The data and scripts to process it are publicly available.

Authors

Keywords

No keywords available for this article.