Machine learning meets pK .

Journal: F1000Research
Published Date:

Abstract

We present a small molecule pK prediction tool entirely written in Python. It predicts the macroscopic pK value and is trained on a literature compilation of monoprotic compounds. Different machine learning models were tested and random forest performed best given a five-fold cross-validation (mean absolute error=0.682, root mean squared error=1.032, correlation coefficient r =0.82). We test our model on two external validation sets, where our model performs comparable to Marvin and is better than a recently published open source model. Our Python tool and all data is freely available at https://github.com/czodrowskilab/Machine-learning-meets-pKa.

Authors

  • Marcel Baltruschat
    Faculty of Chemistry and Chemical Biology, TU Dortmund University, Otto-Hahn-Strasse 6, 44227 Dortmund, Germany.
  • Paul Czodrowski
    Technical University of Dortmund, Dortmund, Germany.