RNA sequence design and protein–DNA specificity prediction with NA-MPNN

Journal: bioRxiv
Published Date:

Abstract

RNA sequence design and protein–DNA binding specificity prediction can both be framed as nucleic acid inverse-folding problems: finding the most likely nucleic acid sequences given a fixed three-dimensional structure of a nucleic acid or nucleic acid–protein complex. While task-specific tools have been developed, no unified deep learning model for nucleic acid inverse folding has been described; a single model would have larger and more diverse datasets available for training and a considerably greater range of applicability. Here we introduce Nucleic Acid MPNN (NA-MPNN), a message-passing neural network that treats proteins, DNA, and RNA within a unified biopolymer graph representation. NA-MPNN outperforms previous methods on RNA sequence design and fixed-dock protein–DNA specificity prediction, and should be broadly useful for de novo RNA structure design and prediction of DNA-binding specificity.

Authors

  • Andrew Kubaney; Andrew Favor; Lilian McHugh; Raktim Mitra; Robert Pecoraro; Justas Dauparas; Cameron Glasscock; David Baker