Locality-Sensitive Hashing-Based Data Set Reduction for Deep Potential Training.
Journal:
Journal of chemical theory and computation
Published Date:
Jun 10, 2025
Abstract
Machine learning methods provide a great scope for developing ab initio quality potentials for diverse systems, ranging from simple fluids to complex solids. However, these methods typically require extensive data sets for effective model training, and the accuracy of the ML potential is highly dependent on data quality, necessitating expensive ab initio calculations. To address this challenge, we present a novel method based on locality-sensitive hashing, designed to minimize the data set size, thereby reducing the number of expensive quantum chemical calculations while preserving the data set's diversity and accuracy. Our approach achieves data set reductions of nearly an order of magnitude. To demonstrate the method's effectiveness, we applied it to develop ML potentials to study a prototypical chemical reaction in an explicit solvent and a first-order phase transition. Finally, well-tempered metadynamics simulations utilizing these ML potentials enabled us to calculate the converged free energy surfaces for both the chemical reaction and the phase transition.
Authors
Keywords
No keywords available for this article.