A transformer model for de novo sequencing of data-independent acquisition mass spectrometry data.

Journal: Nature methods
Published Date:

Abstract

A core computational challenge in the analysis of mass spectrometry data is the de novo sequencing problem, in which the generating amino acid sequence is inferred directly from an observed fragmentation spectrum without the use of a sequence database. Recently, deep learning models have made substantial advances in de novo sequencing by learning from massive datasets of high-confidence labeled mass spectra. However, these methods are designed primarily for data-dependent acquisition experiments. Over the past decade, the field of mass spectrometry has been moving toward using data-independent acquisition (DIA) protocols for the analysis of complex proteomic samples owing to their superior specificity and reproducibility. Hence, we present a de novo sequencing model called Cascadia, which uses a transformer architecture to handle the more complex data generated by DIA protocols. In comparisons with existing approaches for de novo sequencing of DIA data, Cascadia achieves substantially improved performance across a range of instruments and experimental protocols.

Authors

  • Justin Sanders
    Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, WA, USA.
  • Bo Wen
  • Paul A Rudnick
    Spectragen Informatics LLC, Bainbridge Island, WA, USA.
  • Richard S Johnson
    Department of Genome Sciences, University of Washington, Seattle, WA, USA.
  • Christine C Wu
    Department of Genome Sciences, University of Washington, Seattle, WA, USA.
  • Michael Riffle
    Department of Genome Sciences, University of Washington, Seattle, WA, USA.
  • Sewoong Oh
    Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, WA, USA.
  • Michael J MacCoss
    Department of Genome Sciences, University of Washington, Seattle, WA, USA.
  • William Stafford Noble
    1] Department of Computer Science and Engineering, University of Washington, 185 Stevens Way, Seattle, Washington 98195-2350, USA. [2] Department of Genome Sciences, University of Washington, 3720 15th Ave NE Seattle, Washington 98195-5065, USA.

Keywords

No keywords available for this article.