ProteoBoostR: an interactive framework for supervised machine learning in clinical proteomics.
Journal:
Clinical proteomics
Published Date:
Jan 24, 2026
Abstract
BACKGROUND: Mass spectrometry-based proteomics enables high-throughput quantification of thousands of proteins in clinical samples, fueling biomarker discovery for disease diagnosis and prognosis. However, leveraging complex proteomic profiles for predictive modeling often requires advanced machine learning (ML) expertise that many biomedical researchers lack. User-friendly tools are needed to apply state-of-the-art ML algorithms to proteomics data. XGBoost is a powerful tree-based ML algorithm known for high accuracy in classification tasks, and has been successfully used to classify cancer subtypes from multi-omics data. METHODS: We developed ProteoBoostR, a Shiny application that streamlines supervised ML on protein abundance datasets. It allows researchers to train, evaluate and apply XGBoost classification models through an interactive web interface, without requiring coding. RESULTS: We demonstrate the application of ProteoBoostR for the classification of proteomic subtypes across two independent datasets of glioblastoma multiforme, and for the detection of lung adenocarcinoma in serum. These application examples illustrate how ProteoBoostR can harness proteomic patterns for the stratification of patients. CONCLUSIONS: ProteoBoostR is an open-source application that empowers proteomics researchers to perform advanced ML classification. It can be readily applied to other proteomic datasets and disease contexts, promoting reproducible ML analyses in proteomics and accelerating the translation of omics-based classifiers into clinical research.
Authors
Keywords
No keywords available for this article.