Machine Learning Strategies to Tackle Data Challenges in Mass Spectrometry-Based Proteomics.

Journal: Journal of the American Society for Mass Spectrometry

Published Date: Jul 29, 2024

Abstract

In computational proteomics, machine learning (ML) has emerged as a vital tool for enhancing data analysis. Despite significant advancements, the diversity of ML model architectures and the complexity of proteomics data present substantial challenges in the effective development and evaluation of these tools. Here, we highlight the necessity for high-quality, comprehensive data sets to train ML models and advocate for the standardization of data to support robust model development. We emphasize the instrumental role of key data sets like ProteomeTools and MassIVE-KB in advancing ML applications in proteomics and discuss the implications of data set size on model performance, highlighting that larger data sets typically yield more accurate models. To address data scarcity, we explore algorithmic strategies such as self-supervised pretraining and multitask learning. Ultimately, we hope that this discussion can serve as a call to action for the proteomics community to collaborate on data standardization and collection efforts, which are crucial for the sustainable advancement and refinement of ML methodologies in the field.

Authors

Ceder Dens

Adrem Data Lab, Department of Computer Science, University of Antwerp, Middelheimlaan 1, 2020 Antwerpen, Belgium.
Charlotte Adams

Department of Computer Science, University of Antwerp, Antwerp, Belgium.
Kris Laukens

Adrem Data Lab, Department of Computer Science, University of Antwerp, Antwerp, Belgium; Antwerp Unit for Data Analysis and Computation in Immunology and Sequencing (AUDACIS), University of Antwerp, Antwerp, Belgium; Biomedical Informatics Research Network Antwerp (Biomina), University of Antwerp, Antwerp, Belgium.
Wout Bittremieux

Department of Computer Science, University of Antwerp, Antwerp, Belgium. wout.bittremieux@uantwerpen.be.

Keywords

Algorithms Databases, Protein Humans Machine Learning Mass Spectrometry Proteomics

External Resources

View on PubMed Access via DOI PubMed (39074335)

Machine Learning Strategies to Tackle Data Challenges in Mass Spectrometry-Based Proteomics.

Abstract

Authors

Keywords

External Resources

Popular Topics

Recent Journals