The topology of molecular representations and its influence on machine learning performance.

Journal: Journal of cheminformatics

Published Date: Jul 21, 2025

Abstract

Advancements in cheminformatics have led to numerous methods for encoding molecules numerically. The choice of molecular representation impacts the accuracy and generalizability of learning algorithms applied to chemical datasets. Designing and selecting the appropriate representation often lacks a systematic approach and follows computationally exhaustive empirical testing. Moreover, research has shown that deep learning models do not substantially outperform traditional approaches across many tasks with no clear explanation for this shortfall. In this work, we present TopoLearn, a model that predicts the effectiveness of representations on datasets based on the topological characteristics of the corresponding feature space. Using interpretability techniques, we find that persistent homology descriptors are linked with the error metrics of trained machine learning models, offering a new method to better understand and select molecular representations.Scientific contribution Our research is the first to establish an empirical connection between the topology of feature spaces and the machine learning performance of molecular representations. In addition, we facilitate future research endeavors by providing open access to our developed model.

Authors

Florian Rottach

Boehringer Ingelheim Pharma GmbH & Co. KG, 88397 Biberach, Germany.
Sebastian Schieferdecker

Boehringer Ingelheim Pharma GmbH & Co. KG, 88397 Biberach, Germany.
Carsten Eickhoff

Department of Computer Science, ETH Zurich, Zurich, Switzerland; Center for Biomedical Informatics, Brown University, Providence, RI, USA.

Keywords

No keywords available for this article.

External Resources

View on PubMed Access via DOI PubMed (40691856)

The topology of molecular representations and its influence on machine learning performance.

Abstract

Authors

Keywords

External Resources

Popular Topics

Recent Journals

The topology of molecular representations and its influence on machine learning performance.

Abstract

Authors

Keywords

External Resources

Stay Ahead of Medical AI

Popular Topics

Recent Journals