Multi-condition machine learning models for understanding retention mechanisms and predicting retention time in supercritical fluid chromatography/mass spectrometry.

Journal: Analytica chimica acta
Published Date:

Abstract

BACKGROUND: Modern supercritical fluid chromatography (SFC) enables fast and efficient separations owing to the low viscosity and high diffusivity of supercritical mobile phases. However, its retention mechanisms remain incompletely understood, limiting method development and confident compound identification in SFC/MS. In this study, the retention times (RTs) of 1217 compounds measured under 51 chromatographic conditions-covering 15 stationary phases, three modifier chemistries (neutral, acidic, and basic), and two gradient programs-were analyzed to develop RT prediction models and elucidate the underlying retention mechanisms. RESULTS: Gradient boosting (GB) models were first trained separately for each condition using the measured RTs together with 2285 molecular descriptors. Then, for the first time, system descriptors encoding chromatographic conditions (i.e., stationary phase, modifier, and gradient type) were introduced to integrate these individual models into multi-condition models. These models achieved high predictive accuracy, with R2 values of 0.951 and 0.923 and mean absolute errors (MAE) of 0.613 and 0.520 min for Gradients 1 (G1) and 2 (G2), respectively. To interpret retention mechanisms, GB-selected descriptors were quantified using partial least squares (PLS), classified into 10 physicochemical categories, and evaluated using the normalized combination effect (nCE) across conditions. Subsequently, RT shift analysis revealed the most pronounced differences between neutral and acidic media. Finally, heatmaps for each stationary phase summarized peak quality and detection percentages for functional group clusters. SIGNIFICANCE: By introducing system descriptors, this study established multi-condition RT prediction models that accurately predict retention across diverse SFC conditions. Moreover, comprehensive descriptor-based analysis under 51 conditions elucidated the underlying retention mechanisms and provided a practical framework for selecting optimal analytical conditions.

Authors

Keywords

No keywords available for this article.