State Ensemble Energy Recognition (SEER): A Hybrid Gas-Phase Molecular Charge State Predictor.
Journal:
Journal of chemical information and modeling
Published Date:
Jul 8, 2025
Abstract
Accurately resolving a three-dimensional structure that corresponds to an experimental mass spectrometry (MS) result is valuable for outcomes such as improved analyte identification, determination of physiochemical properties relating to conformation, analyte impurity testing, and drug chemical integrity analysis. Computational approaches utilizing charge state modeling, conformational sampling, quantum mechanical optimizations, relative energy scoring, and computed ion-neutral collision cross sections (CCS) have historically achieved success at assigning equilibrium structures to ion-mobility MS-derived CCS values. Despite this positive status, there remains a lack of new computational software to achieve higher throughput when modeling large systems. A major adverse impact on computational cost is the general increase in titratable sites with molecular size, which then warrants additional protonation/deprotonation models in order to ensure that the correct charge state is captured. Here, we introduce a user-friendly machine learning program called SEER (tate nsemble nergy ecognition) to accurately and efficiently predict the equilibrium charge states of MS-relevant ions. We report that for all systems within the test set, SEER successfully captured the lowest relative energy minimum charge states within its top two predicted candidates from an overall average number of ∼ seven titratable sites. Furthermore, the density functional theory optimized geometries for SEER assigned charge states produced CCS experimental errors that are within the acceptable threshold (i.e., ≤3% error) set for this work. The benchmark study compared SEER to two well-established charge state prediction software packages CREST and Epik classic and found that SEER is either on par or better at consistently locating the correct charge states for the test set with competitive efficiency. SEER requires no additional user programming and is readily accessible through the Google Colab platform at https://github.com/mitkeng/SEER.