Prediction of Retention Time by Combining Multiple Data Sets with Chromatographic Parameter Vectorization and Transfer Learning.

Journal: Analytical chemistry
Published Date:

Abstract

Retention time (RT) can provide orthogonal information to mass spectra, supporting the qualitative identification. However, RT is influenced by experimental conditions and column parameters, and it is difficult to have a large amount of RT data in the user's experimental conditions. Hence, various machine learning methods, including advanced deep learning approaches, have been developed for RT prediction. However, most of them were limited to a given column and operational conditions. In the meantime, data sparsity often hinders the prediction performance. In this study, we propose an MDL-TL method that combines multiple data sets to jointly train the base model. MDL-TL vectorizes the column and conditions (chromatographic parameters, CPs) using word2vec and autoencoders, and distinguishes the data sets from different chromatographic experiments by including the CPs in the compound representation. This not only augments the data but also introduces the CPs into the RT prediction, allowing the pretrained model to be efficiently transferred to different target systems by fine-tuning. MDL-TL was evaluated against five popular deep learning approaches and four machine learning approaches on 14 reversed-phase liquid chromatography data sets and 14 hydrophilic interaction liquid chromatography data sets, respectively. The results show that our method surpassed the compared methods, including transfer learning methods based on the METLIN small molecule retention time (SMRT) data set, in mean absolute error, median absolute error, mean relative error, and in most cases, demonstrating that MDL-TL is a promising approach for predicting RTs for various chromatographic systems and operational conditions.

Authors

  • Yansong Li
    State Key Laboratory for Diagnosis and Treatment of Severe Zoonotic Infectious Diseases, Key Laboratory for Zoonosis Research of the Ministry of Education, Institute of Zoonosis, and College of Veterinary Medicine, Jilin University, Changchun 130062, China.
  • Kunjie Dong
    School of Computer Science and Technology, Dalian University of Technology, Dalian, 116024, China. Electronic address: kjdong@mail.dlut.edu.cn.
  • Di Yu
    State Key Laboratory of Medical Proteomics, Dalian Institute of Chemical Physics, Chinese Academy of Sciences, Dalian 116023, China.
  • Dongdong Huang
    Department of Respiratory and Critical Care Medicine, Center for Respiratory Medicine, the Fourth Affiliated Hospital of School of Medicine, and International School of Medicine, International Institutes of Medicine, Zhejiang University, Yiwu, China.
  • Xinyu Liu
    Institute of Medical Technology, Peking University Health Science Center, Beijing, China.
  • Guowang Xu
    CAS Key Laboratory of Separation Sciences for Analytical Chemistry, Dalian Institute of Chemical Physics, Chinese Academy of Sciences, Dalian 116023, China.
  • Xiaohui Lin
    School of Computer Science and Technology, Dalian University of Technology, Dalian 116024, China. datas@dlut.edu.cn.

Keywords

No keywords available for this article.