Machine Learning applied to MALDI-TOF data in a clinical setting: a systematic review
Journal:
bioRxiv
Published Date:
Apr 19, 2026
Abstract
Bacterial identification, antimicrobial resistance prediction, and strain typification are critical tasks in clinical microbiology, essential for guiding patient treatment and controlling the spread of infectious diseases. While Machine Learning (ML) has shown immense promise in enhancing Matrix-Assisted Laser Desorption/Ionization-Time of Flight Mass Spectrometry (MALDI-TOF MS) applications for these tasks, there is currently no comprehensive review that fully addresses this from a technical ML perspective. To address this gap, we systematically reviewed 115 studies published between 2004 and 2025, focusing on key ML aspects such as data size and balance, pre-processing pipelines, model selection and evaluation, open-source data, and code availability. Our analysis highlights the predominant use of classical ML models like Random Forest and Support Vector Machines, alongside emerging interest in Deep Learning approaches for handling complex, high-dimensional data. Despite significant progress, challenges such as inconsistent pre-processing workflows, reliance on black-box models, limited external validation, and insufficient open-source resources persist, hindering transparency, reproducibility, and broader adoption. This review offers actionable insights to enhance ML-driven bacterial diagnostics, advocating for standardized methodologies, greater transparency, and improved data accessibility. In addition, we provide guidelines on how to approach ML for MALDI-TOF MS analysis, helping researchers navigate key decisions in model development and evaluation.