RNA sequence design and protein–DNA specificity prediction with NA-MPNN
Journal:
bioRxiv
Published Date:
Jan 1, 2025
Abstract
RNA sequence design and protein–DNA binding specificity prediction can both be framed as nucleic acid inverse-folding problems: finding the most likely nucleic acid sequences given a fixed three-dimensional structure of a nucleic acid or nucleic acid–protein complex. While task-specific tools have been developed, no unified deep learning model for nucleic acid inverse folding has been described; a single model would have larger and more diverse datasets available for training and a considerably greater range of applicability. Here we introduce Nucleic Acid MPNN (NA-MPNN), a message-passing neural network that treats proteins, DNA, and RNA within a unified biopolymer graph representation. NA-MPNN outperforms previous methods on RNA sequence design and fixed-dock protein–DNA specificity prediction, and should be broadly useful for de novo RNA structure design and prediction of DNA-binding specificity.