Locality-Sensitive Hashing-Based Data Set Reduction for Deep Potential Training.

Journal: Journal of chemical theory and computation

Published Date: Jun 10, 2025

Abstract

Machine learning methods provide a great scope for developing ab initio quality potentials for diverse systems, ranging from simple fluids to complex solids. However, these methods typically require extensive data sets for effective model training, and the accuracy of the ML potential is highly dependent on data quality, necessitating expensive ab initio calculations. To address this challenge, we present a novel method based on locality-sensitive hashing, designed to minimize the data set size, thereby reducing the number of expensive quantum chemical calculations while preserving the data set's diversity and accuracy. Our approach achieves data set reductions of nearly an order of magnitude. To demonstrate the method's effectiveness, we applied it to develop ML potentials to study a prototypical chemical reaction in an explicit solvent and a first-order phase transition. Finally, well-tempered metadynamics simulations utilizing these ML potentials enabled us to calculate the converged free energy surfaces for both the chemical reaction and the phase transition.

Authors

Anmol

Department of Chemistry, Indian Institute of Technology, Delhi, Hauz Khas, New Delhi 110016, India.
Anuj Kumar Sirohi

Yardi School of Artificial Intelligence, Indian Institute of Technology, Delhi, Hauz Khas, New Delhi 110016, India.
Neha

Department of Computer Science & Engineering, Chandigarh University, Mohali, 140413, India.
Jayadeva

Department of Electrical Engineering, Indian Institute of Technology, Delhi, India. Electronic address: jayadeva@ee.iitd.ac.in.
Sandeep Kumar

Cellon S.A., ZAE Robert Steichen, 16 rue Hèierchen, L-4940, Bascharage, Luxembourg.
Tarak Karmakar

Department of Chemistry, Indian Institute of Technology, Delhi, Hauz Khas, New Delhi 110016, India.

Keywords

No keywords available for this article.

External Resources

View on PubMed Access via DOI PubMed (40494814)

Locality-Sensitive Hashing-Based Data Set Reduction for Deep Potential Training.

Abstract

Authors

Keywords

External Resources

Popular Topics

Recent Journals

Locality-Sensitive Hashing-Based Data Set Reduction for Deep Potential Training.

Abstract

Authors

Keywords

External Resources

Stay Ahead of Medical AI

Popular Topics

Recent Journals