π-MSNet: A billion-scale, AI-ready living proteomics data portal

Journal: bioRxiv
Published Date:

Abstract

Artificial intelligence (AI) is reshaping proteomics workflows, delivering remarkable gains in both peptide identification sensitivity and quantitative performance. However, the potential of deep learning models in proteomics has not been fully exploited due to the scarcity of large-scale, high-quality and consistently labeled datasets. Here, we present {pi}-MSNet, a billion-scale, AI-ready living mass spectrometry (MS) data portal. Using a uniform identification and quality control workflow, it comprises over 1.66 billion MS/MS spectra, 501 million peptide-spectrum matches (PSMs), and 9 million precursors from 36,356 LC-MS/MS runs across ten instrument types and 55 diverse species. Through community collaboration, the data are shared via international, interactive, and living web resources. Enabled by the built-in MSNetLoader Python API for seamless and scalable data access-with native support for PyTorch and TensorFlow-{pi}-MSNet provides an AI-ready data framework for efficient training and systematic benchmarking of multiple models across three representative tasks (e.g., MS/MS spectrum prediction, retention time prediction, and de novo peptide sequencing). In particular, by retraining multiple models on {pi}-MSNet, we achieved consistent performance improvements over their original versions. These improved models were subsequently integrated into the {pi}-MSNet agent to enable interactive, deployment-free use. Through SDRF (Sample and Data Relationship Format) metadata, an open-source cloud analysis workflow, and a community-driven interactive data portal that supports continuous data submission, {pi}-MSNet serves as a living, AI-ready resource for reproducible benchmarking, robust model training, and accelerated AI innovation in proteomics.

Authors

  • Dai
  • C.; Liu
  • Y.; Ling
  • T.; Qiu
  • Y.; Xu
  • H.; Zhang
  • Q.; Huang
  • X.; Zhu
  • Y.; Sachsenberg
  • T.; Bai
  • M.; He
  • F.; Perez-Riverol
  • Y.; Xie
  • L.; Chang
  • C.

Categories