ConfRank+: Extending Conformer Ranking to Charged Molecules.
Journal:
Journal of chemical information and modeling
Published Date:
Aug 7, 2025
Abstract
We present a machine learning model for high-throughput energetic ranking of charged molecular conformers. Based on the ConfRank (Hölzer et al. , 8909-8925) approach, the model is trained in a pairwise fashion to predict energy differences for pairs of conformers. By conditioning the model on data set embedding vectors, we are able to train our model on two different reference levels simultaneously, allowing for a larger training data set and to emulate multiple reference methods. In particular, we train our model on a large subset of the SPICE 2.0.1 data set with ωB97M-D3(BJ)/def2-TZVPPD range-separated hybrid meta-GGA DFT reference computations and a self-developed conformer data set based on the GEOM data set including r2SCAN-3c references. The result is a single multifidelity model that can reproduce both reference levels up to ML-typical model errors for small- and medium-sized molecules including the following elements: H, Li, B, C, N, O, F, Na, Mg, Si, P, S, Cl, K, Ca, Br, and I. By including partial atomic charges obtained from the electronegativity equilibration charge model, our model incorporates information about the charge distribution in a molecule, allowing the treatment of charged closed-shell species and explicit treatment of electrostatic interactions. We test the ranking capability of the model on various data sets, paying special attention to molecular charges of -1, 0, 1. Throughout all tests, we find our model to be as accurate as current AIMNet2 and MACE-OFF23(L) models, while requiring an order of magnitude fewer parameters and matching the robustness of the state-of-the-art semiempirical quantum method GFN2-xTB.