Complete NMR assignment for 275 of the most common dipeptides in intrinsically disordered proteins

Journal: bioRxiv
Published Date:

Abstract

Accurate NMR chemical shift assignments are essential for atomic-resolution characterization of proteins. Especially for intrinsically disordered proteins (IDPs) and regions (IDRs), however, the assignment remains a labor-intensive task due to spectral overlap and conformational heterogeneity. Consequently, complete side-chain assignments are rare. Here, we present a comprehensive reference dataset, comprising the complete NMR chemical shift assignments for 275 of the most prevalent dipeptides in the IDPome, covering 93% of it. The dataset contains all NMR-accessible backbone and side-chain nuclei, in total 9 408 validated data points, as well as the 1D (1H, 13C) and 2D (1H-15N HSQC, 1H-13C HSQC, TOCSY, NOESY, 1H-13C HMBC) spectra used for assignment, making it a rich resource for the training, testing, and benchmarking of tools for data-driven protein assignment, peak picking, and synthetic spectrum generation. To facilitate such machine learning applications, all data are delivered in standardized, machine-readable formats.

Authors

  • Tobias Rindfleisch; Emilie Fjeldberg Taule; Markus S. Miettinen; Jarl Underhaug