Locality-Sensitive Hashing-Based Data Set Reduction for Deep Potential Training.

Journal: Journal of chemical theory and computation
Published Date:

Abstract

Machine learning methods provide a great scope for developing ab initio quality potentials for diverse systems, ranging from simple fluids to complex solids. However, these methods typically require extensive data sets for effective model training, and the accuracy of the ML potential is highly dependent on data quality, necessitating expensive ab initio calculations. To address this challenge, we present a novel method based on locality-sensitive hashing, designed to minimize the data set size, thereby reducing the number of expensive quantum chemical calculations while preserving the data set's diversity and accuracy. Our approach achieves data set reductions of nearly an order of magnitude. To demonstrate the method's effectiveness, we applied it to develop ML potentials to study a prototypical chemical reaction in an explicit solvent and a first-order phase transition. Finally, well-tempered metadynamics simulations utilizing these ML potentials enabled us to calculate the converged free energy surfaces for both the chemical reaction and the phase transition.

Authors

  • Anmol
    Department of Chemistry, Indian Institute of Technology, Delhi, Hauz Khas, New Delhi 110016, India.
  • Anuj Kumar Sirohi
    Yardi School of Artificial Intelligence, Indian Institute of Technology, Delhi, Hauz Khas, New Delhi 110016, India.
  • Neha
    Department of Computer Science & Engineering, Chandigarh University, Mohali, 140413, India.
  • Jayadeva
    Department of Electrical Engineering, Indian Institute of Technology, Delhi, India. Electronic address: jayadeva@ee.iitd.ac.in.
  • Sandeep Kumar
    Cellon S.A., ZAE Robert Steichen, 16 rue Hèierchen, L-4940, Bascharage, Luxembourg.
  • Tarak Karmakar
    Department of Chemistry, Indian Institute of Technology, Delhi, Hauz Khas, New Delhi 110016, India.

Keywords

No keywords available for this article.