Advancing Aqueous Solubility Prediction: A Machine Learning Approach for Organic Compounds Using a Curated Data Set.

Journal: Journal of chemical information and modeling
Published Date:

Abstract

Aqueous solubility is one key property of a chemical compound that determines its possible use in different applications, from drug development to materials sciences. In this work, we present a model for the prediction of aqueous solubility that leverages a curated data set merged from four distinct sources. This data set encompasses a diverse range of organic compounds, providing a robust foundation for our investigation of solubility prediction. Our approach involves employing a variety of machine learning and deep learning models that combine an extensive array of chemical descriptors, fingerprints, and functional groups. This methodology is designed to address the complexities of solubility prediction and is tailored to achieve high accuracy and generalization. We tested the finalized model on a diverse data set of 1282 unique organic compounds from the Huuskonen data set. The results of our analysis demonstrate the success of our model, which, given an value of 0.92 and an MAE value of 0.40, outperforms existing prediction methods for aqueous solubility on one of the most diverse data sets in the field.

Authors

  • Mushtaq Ali
    Department of Computer and Software Technology, University of Swat, Swat, KP, Pakistan.
  • Sylvia Vanderheiden
    Institute of Biological and Chemical Systems (IBCS), Karlsruhe Institute of Technology, Kaiserstraße 12, Karlsruhe 76131, Germany.
  • Christoph W Grathwol
    Institute of Biological and Chemical Systems (IBCS), Karlsruhe Institute of Technology, Kaiserstraße 12, Karlsruhe 76131, Germany.
  • Konrad Krämer
    Institute of Biological and Chemical Systems (IBCS), Karlsruhe Institute of Technology, Kaiserstraße 12, Karlsruhe 76131, Germany.
  • Pascal Friederich
    Institute of Theoretical Informatics, Karlsruhe Institute of Technology, Kaiserstr. 12, Karlsruhe 76131, Germany.
  • Nicole Jung
    Institute of Biological and Chemical Systems (IBCS), Karlsruhe Institute of Technology, Kaiserstraße 12, Karlsruhe 76131, Germany.
  • Stefan Bräse
    Institute of Biological and Chemical Systems (IBCS), Karlsruhe Institute of Technology, Kaiserstraße 12, Karlsruhe 76131, Germany.