Transferability of Data Sets between Machine-Learned Interatomic Potential Algorithms.

Journal: Journal of chemical theory and computation
Published Date:

Abstract

The emergence of Foundational Machine Learning Interatomic Potential (FMLIP) models trained on extensive data sets motivates attempts to transfer data between different ML architectures. Using a common battery electrolyte solvent as a test case, we examine the extent to which training data optimized for one machine-learning method may be reused by a different learning algorithm, aiming to accelerate FMLIP fine-tuning and to reduce the need for costly iterative training. We consider several types of training configurations and compare the benefits they bring to feedforward neural networks (the Deep Potential model) and message-passing networks (MACE). We propose a simple metric to assess model performance and demonstrate that MACE models perform well with even the simplest training sets, whereas simpler architectures require further iterative training to describe the target liquids correctly. We find that configurations designed by human intuition to correct systematic deficiencies of a model often transfer well between algorithms, but that reusing configurations that were generated automatically by one MLIP does not necessarily benefit a different algorithm. We also compare the performance of these bespoke models against two pretrained FMLIPs, demonstrating that system-specific training data are usually necessary for realistic models. Finally, we examine how training data sets affect a model's ability to generalize to unseen molecules, finding that model stability is conserved for small changes in molecule shape but not changes in functional chemistry. Our results provide insight into how training set properties affect the behavior of an MLIP and principles to enhance training sets for molecular liquid models with minimal computational effort. These approaches may be used in tandem with FMLIPs to dramatically accelerate the rate at which new chemical systems can be simulated.

Authors

  • Samuel P Niblett
    Yusuf Hamied Department of Chemistry, University of Cambridge, Lensfield Road, Cambridge CB2 1EW, U.K.
  • Panagiotis Kourtis
    School of Natural and Environmental Science, Newcastle University, Newcastle upon Tyne NE1 7RU, U.K.
  • Ioan-Bogdan Magdău
    School of Natural and Environmental Sciences, Newcastle University, Newcastle upon Tyne NE1 7RU, U.K.
  • Clare P Grey
    Yusuf Hamied Department of Chemistry, University of Cambridge, Lensfield Road, Cambridge CB2 1EW, U.K.
  • Gábor Csányi

Keywords

No keywords available for this article.