Nutmeg and SPICE: Models and Data for Biomolecular Machine Learning.

Journal: Journal of chemical theory and computation
PMID:

Abstract

We describe version 2 of the SPICE data set, a collection of quantum chemistry calculations for training machine learning potentials. It expands on the original data set by adding much more sampling of chemical space and more data on noncovalent interactions. We train a set of potential energy functions called Nutmeg on it. They are based on the TensorNet architecture. They use a novel mechanism to improve performance on charged and polar molecules, injecting precomputed partial charges into the model to provide a reference for the large-scale charge distribution. Evaluation of the new models shows that they do an excellent job of reproducing energy differences between conformations even on highly charged molecules or ones that are significantly larger than the molecules in the training set. They also produce stable molecular dynamics trajectories and are fast enough to be useful for routine simulation of small molecules.

Authors

  • Peter Eastman
    Department of Chemistry, Stanford University, 337 Campus Drive, Stanford, California 94305, United States.
  • Benjamin P Pritchard
    Molecular Sciences Software Institute, Virginia Polytechnic Institute and State University, Blacksburg, Virginia 24060, United States.
  • John D Chodera
    Computational and Systems Biology Program, Sloan Kettering Institute, Memorial Sloan Kettering Cancer Center, New York, New York 10065, United States.
  • Thomas E Markland
    Department of Chemistry, Stanford University, Stanford, California 94305, United States.