Enhancing lipid identification in LC-HRMS data through machine learning-based retention time prediction.
Journal:
Journal of chromatography. A
PMID:
39798479
Abstract
The comprehensive identification of peaks in untargeted lipidomics using LC-MS/MS remains a significant challenge. Confidence in lipid annotation can be greatly improved by integrating a highly accurate machine learning-based retention time prediction model. Such an approach enables the identification of lipids for understanding pathogenic mechanisms, biomarker discovery, and drug screening. In this study, we developed a machine learning model to predict retention times and facilitate lipid peak annotations in LC-MS-based untargeted lipidomics. Our model achieved high correlation coefficients of 0.998 and 0.990, with mean absolute errors (MAE) of 0.107 min and 0.240 min for the training and test sets, respectively. External validation showed similarly strong performance, with correlations of 0.991 and 0.978, and MAE values of 0.241 min and 0.270 min. We also compared the impact of molecular descriptors and molecular fingerprints on the model's performance, finding that molecular descriptors outperformed molecular fingerprints across all datasets when using Random Forest (RF) for model construction. Notably, this retention time calibration model demonstrates robust performance across chromatographic systems with comparable gradients and flow rates. Overall, this machine learning model enhances lipid annotation accuracy and reduces errors in untargeted lipidomics, improving data analysis across multiple datasets.