Evaluating Molecular Similarity Measures: Do Similarity Measures Reflect Electronic Structure Properties?

Journal: Journal of chemical information and modeling

PMID: 40299458

Abstract

The rapid adoption of big data, machine learning (ML), and generative artificial intelligence (AI) in chemical discovery has heightened the importance of quantifying molecular similarity. Molecular similarity, commonly assessed as the distance between molecular fingerprints, is integral to applications such as database curation, diversity analysis, and property prediction. AI tools frequently rely on these similarity measures to cluster molecules under the assumption that structurally similar molecules exhibit similar properties. However, this assumption is not universally valid, particularly for continuous properties like electronic structure properties. Despite the prevalence of fingerprint-based similarity measures, their evaluation has largely depended on biological activity data sets and qualitative metrics, limiting their relevance for nonbiological domains. To address this gap, we propose a framework to evaluate the correlation between molecular similarity measures and molecular properties. Our approach builds on the concept of neighborhood behavior and incorporates kernel density estimation (KDE) analysis to quantify how well similarity measures capture property relationships. Using a data set of over 350 million molecule pairs with electronic structure, redox, and optical properties, we systematically evaluate the correlation between several molecular fingerprint generators, distance functions, and these properties. Both the curated data set and the evaluation framework are publicly available.

Authors

Rebekah Duke

Department of Chemistry and Center for Applied Energy Research, University of Kentucky, Lexington, Kentucky 40506, United States.
Chih-Hsuan Yang

Department of Mechanical Engineering and Translational AI Research and Education Center, Iowa State University, Ames, Iowa 50011, United States.
Baskar Ganapathysubramanian

Department of Mechanical Engineering and Translational AI Research and Education Center, Iowa State University, Ames, Iowa 50011, United States.
Chad Risko

Department of Chemistry and Center for Applied Energy Research, University of Kentucky, Lexington, Kentucky 40506, United States.

Keywords

Electrons Machine Learning Molecular Structure

External Resources

View on PubMed Access via DOI PubMed (40299458)

Evaluating Molecular Similarity Measures: Do Similarity Measures Reflect Electronic Structure Properties?

Abstract

Authors

Keywords

External Resources

Popular Topics

Recent Journals