Development of Peptide Identification System for ToF-SIMS Spectra Using Supervised Machine Learning.

Journal: Journal of the American Society for Mass Spectrometry
PMID:

Abstract

Time-of-flight secondary ion mass spectrometry (ToF-SIMS) data interpretation for organic materials is complicated because of various fragment ions produced from each molecule and the overlapping of certain mass peaks from different molecules. Fragmentation mechanisms in SIMS are complex because different sputtering and ionization processes can simultaneously occur. Therefore, a prediction system that can identify materials in a sample is required. A novel prediction system for peptides based on ToF-SIMS and amino-acid-based teaching information (labels) for supervised machine learning was developed. To develop the prediction system for general organic materials, the annotation of materials is crucial to creating effective labels for supervised learning. Peptides are composed of 20 amino acid residues, which can be used as labels. We previously developed a peptide prediction system using Random Forest, a supervised machine-learning method. However, only the amino acids contained in the target peptide were predicted, and the amino acid sequence was unable to be assumed. In this study, the amino acid sequence of the test peptide was determined by adding the information on two adjacent amino acids to the labels. Once the prediction system learned the target peptide spectra, the peptides in the newly obtained ToF-SIMS spectra could be identified. The new prediction system also provides useful information for the identification of unknown peptides. The prediction results indicate that two adjacent permutations of amino acids are effective pieces of teaching information for expressing the amino acid sequence of a peptide.

Authors

  • Satoka Aoyagi
    Department of Materials and Life Science, Seikei University, 3-3-1, Kichijoji-Kitamachi, Musashino, Tokyo 180-8633, Japan.
  • Miya Fujita
    JSR Corporation, 100 Kawajiri-Cho, Yokkaichi, Mie 510-8552, Japan.
  • Hidemi Itoh
    Platform Laboratory for Science and Technology, Asahi Kasei Corporation, 2-1 Samejima, Fuji, Shizuoka 416-8501, Japan.
  • Hiroto Itoh
    Material Science Group, Data Generation Division, Data Science Center, Technology Development Headquarters, Konica Minolta, Inc., Tokyo 100-7015, Japan.
  • Takaharu Nagatomi
    Platform Laboratory for Science and Technology, Asahi Kasei Corporation, 2-1 Samejima, Fuji, Shizuoka 416-8501, Japan.
  • Masayuki Okamoto
    Analytical Science Research Laboratory, Kao Corp., Minato 1334, Wakayama-shi, Wakayama 640-8580, Japan.
  • Tomikazu Ueno
    JSR Corporation, 100 Kawajiri-Cho, Yokkaichi, Mie 510-8552, Japan.