Machine Learning-Based Retention Time Prediction Tool for Routine LC-MS Data Analysis.

Journal: Journal of chemical information and modeling
Published Date:

Abstract

Accurate retention time () prediction models can significantly improve liquid chromatography-mass spectrometry (LC-MS) data analysis widely used in chemical synthesis. As hundreds of thousands of syntheses are performed annually at Enamine, a large amount of experimental data has been generated internally. In this paper, we present the development of an prediction model based on the GATv2Conv + DL graph neural network (NN) architecture, trained on the internal data and further evaluated using the METLIN SMRT data set. The final model achieved a mean absolute error (MAE) of 2.48 s for the 120 s LC-MS method. We also conducted a detailed analysis of prediction errors and determined that the interval between - 7.12 s and + 9.58 s contained over 95% of the data. The developed model has been successfully integrated into the existing in-house LC-MS analysis toolkit, enhancing its predictive and analytical capabilities. Additionally, we have published a curated subset of 20,000 data points from our internal data set to support community benchmarking and further research.

Authors

  • Sofiia A Dymura
    Enamine Ltd. (www.enamine.net), Winston Churchill Street 78, Kyiv 02094, Ukraine.
  • Oleksandr O Viniichuk
    Enamine Ltd. (www.enamine.net), Winston Churchill Street 78, Kyiv 02094, Ukraine.
  • Kostiantyn P Melnykov
    Enamine Ltd. (www.enamine.net), Winston Churchill Street 78, Kyiv 02094, Ukraine.
  • Dmytro S Radchenko
    Enamine Ltd., Kyiv, Ukraine.
  • Oleksandr O Grygorenko
    Enamine Ltd., Kyiv, Ukraine.