MSSort-DIA: A deep learning classification tool of the peptide precursors quantified by OpenSWATH.

Journal: Journal of proteomics
Published Date:

Abstract

OpenSWATH is an analysis toolkit commonly used for data independent acquisition (DIA). Although the output of OpenSWATH is controlled at 1% false discovery rate (FDR), the output report still contains many peptide precursors with low similarity fragments. At the last step of OpenSWATH for peptide quantification, researchers usually need to manually check the similarity of the extracted ion chromatograms (XICs) of fragments to distinguish the high confidence and the low confidence peptide precursors. Here we developed an algorithm with a Graphic User Interface named MSSort-DIA, which combines the deep convolutional neural network (CNN) and the double-threshold segmentation process, to automatically recognize the high confidence precursors and low confidence precursors. To train the model of MSSort-DIA, we built a database contained about 50,000 manually classified peptide precursors acquired from different instrument platforms and different species. With the double-threshold segmentation strategy, MSSort-DIA can reduce the number of the low confidence peptides required for manual inspections to less than 10% and be used as the last step of OpenSWATH to visualize and classify the MS/MS data of peptide precursors. SIGNIFICANCE: Although the output of OpenSWATH is controlled at 1% false discovery rate (FDR), the output report still contains many peptide precursors with low similarity fragments. At the last step of OpenSWATH for peptide quantification, researchers usually need to manually check the similarity of fragment XICs to distinguish the high confidence and the low confidence peptide precursors. However, manual inspection is inefficient. For instance, it takes about 50 h to sort even a small dataset of 1000 MS/MS spectra manually. In this paper we developed a software named MSSort-DIA to automatically recognize the high confidence precursors. We manually classify 50,000 peptide precursors as training set to train a convolutional neural network. After training finished, MSSort-DIA takes only a few minutes to automatically classify 20,000 peptide precursors, leaving a small portion of fuzzy ones for manual inspection. On the benchmarked dataset, MSSort-DIA can significantly improve the efficiency and accuracy of recognition of high confidence peptide precursors.

Authors

  • Yiming Li
    Department of Cardiology, West China Hospital, Sichuan University, Chengdu 610041, China.
  • Qingzu He
    Department of Physics, and Fujian Provincial Key Laboratory for Soft Functional Materials Research, Xiamen University, Xiamen 361005, China; Wenzhou Institute, University of Chinese Academy of Sciences, and Oujiang Laboratory (Zhejiang Lab for Regenerative Medicine, Vision and Brain Health), Wenzhou, Zhejiang 325001, China.
  • Huan Guo
    Department of Occupational and Environmental Health, State Key Laboratory of Environmental Health (Incubating), School of Public Health, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, Hubei, China.
  • Chuan-Qi Zhong
    State Key Laboratory of Cellular Stress Biology, Innovation Center for Cell Signaling Network, School of Life Sciences, Xiamen University, Xiamen 361102, China.
  • Xiang Li
    Department of Radiology, Massachusetts General Hospital and Harvard Medical School, Boston, MA, United States.
  • Yulin Li
    Department of Physics, and Fujian Provincial Key Laboratory for Soft Functional Materials Research, Xiamen University, Xiamen 361005, China.
  • Jiahuai Han
    State Key Laboratory of Cellular Stress Biology, Innovation Center for Cell Signaling Network, School of Life Sciences, Xiamen University, Xiamen 361102, China; National Institute for Data Science in Health and Medicine, School of Medicine, Xiamen University, Xiamen 361102, China. Electronic address: jhan@xmu.edu.cn.
  • Jianwei Shuai
    Department of Physics, and Fujian Provincial Key Laboratory for Soft Functional Materials Research, Xiamen University, Xiamen 361005, China; Wenzhou Institute, University of Chinese Academy of Sciences, and Oujiang Laboratory (Zhejiang Lab for Regenerative Medicine, Vision and Brain Health), Wenzhou, Zhejiang 325001, China; National Institute for Data Science in Health and Medicine, School of Medicine, Xiamen University, Xiamen 361102, China. Electronic address: jianweishuai@xmu.edu.cn.