Classification of subspecies based on MALDI-TOF MS protein profiles using machine learning models.
Journal:
Microbiology spectrum
PMID:
39162551
Abstract
UNLABELLED: is an important bacterial species used as a starter culture for fermented foods; however, two subspecies within this species exhibit different properties in the foods. Matrix-assisted laser desorption/ionization-time of flight mass spectrometer (MALDI-TOF MS) is the gold standard for microbial fingerprinting. However, the resolution power is down to the species level. This study was to combine MALDI-TOF mass spectra and machine learning to develop a new method to identify two subspecies ( subsp. and subsp. ) and non-. species. Totally, 227 strains were collected, with 908 spectra obtained via on- and off-plate protein extraction. Only 68.7% of strains were correctly identified at the subspecies level in the Biotyper database; however, a high level of performance was observed from the machine learning models. Partial least squares-discriminant analysis (PLS-DA), principal component analysis-K-nearest neighbor (PCA-KNN), and support vector machine (SVM) demonstrated 0.823, 0.914, and 0.903 accuracies, respectively, whereas the random forest (RF) achieved an accuracy of 0.954, with an area under the receiver operating characteristic (AUROC) curve of 0.99, outperforming the other algorithms in distinguishing the subspecies. The machine learning proved to be a promising technique for the rapid and high-resolution classification of subspecies using MALDI-TOF MS.