Building Machine Learning Small Molecule Melting Points and Solubility Models Using CCDC Melting Points Dataset.

Journal: Journal of chemical information and modeling
Published Date:

Abstract

Predicting solubility of small molecules is a very difficult undertaking due to the lack of reliable and consistent experimental solubility data. It is well known that for a molecule in a crystal lattice to be dissolved, it must, first, dissociate from the lattice and then, second, be solvated. The melting point of a compound is proportional to the lattice energy, and the octanol-water partition coefficient (log ) is a measure of the compound's solvation efficiency. The CCDC's melting point dataset of almost one hundred thousand compounds was utilized to create widely applicable machine learning models of small molecule melting points. Using the general solubility equation, the aqueous thermodynamic solubilities of the same compounds can be predicted. The global model could be easily localized by adding additional melting point measurements for a chemical series of interest.

Authors

  • Xiangwei Zhu
    School of Computer Science and Technology, Chongqing University of Posts and Telecommunications, Chongqing, China. Electronic address: s200231147@stu.cqupt.edu.cn.
  • Valery R Polyakov
    Novartis Institute for Biomedical Research , 5300 Chiron Way , Emeryville , California 94608-2916 , United States.
  • Krishna Bajjuri
    Sutro Biopharma, 111 Oyster Point Blvd, South San Francisco, California 94080, United States.
  • Huiyong Hu
    Sutro Biopharma, 111 Oyster Point Blvd, South San Francisco, California 94080, United States.
  • Andreas Maderna
    Sutro Biopharma, 111 Oyster Point Blvd, South San Francisco, California 94080, United States.
  • Clare A Tovee
    Cambridge Crystallographic Data Centre, 12 Union Road, Cambridge CB2 1EZ, U.K.
  • Suzanna C Ward
    Cambridge Crystallographic Data Centre, 12 Union Road, Cambridge CB2 1EZ, U.K.