SolTranNet-A Machine Learning Tool for Fast Aqueous Solubility Prediction.

Journal: Journal of chemical information and modeling
Published Date:

Abstract

While accurate prediction of aqueous solubility remains a challenge in drug discovery, machine learning (ML) approaches have become increasingly popular for this task. For instance, in the Second Challenge to Predict Aqueous Solubility (SC2), all groups utilized machine learning methods in their submissions. We present SolTranNet, a molecule attention transformer to predict aqueous solubility from a molecule's SMILES representation. Atypically, we demonstrate that larger models perform worse at this task, with SolTranNet's final architecture having 3,393 parameters while outperforming linear ML approaches. SolTranNet has a 3-fold scaffold split cross-validation root-mean-square error (RMSE) of 1.459 on AqSolDB and an RMSE of 1.711 on a withheld test set. We also demonstrate that, when used as a classifier to filter out insoluble compounds, SolTranNet achieves a sensitivity of 94.8% on the SC2 data set and is competitive with the other methods submitted to the competition. SolTranNet is distributed via pip, and its source code is available at https://github.com/gnina/SolTranNet.

Authors

  • Paul G Francoeur
    Department of Computational & Systems Biology, School of Medicine, University of Pittsburgh, 3501 Fifth Avenue, Suite 3064, Biomedical Science Tower 3 (BST3), Pittsburgh, PA, 15260, USA.
  • David R Koes
    Department of Computational and System Biology, University of Pittsburgh, Pittsburgh, Pennsylvania, United States of America.