Predicting protein-nucleic acid interactions via protein language models with biophysical and evolutionary priors.
Journal:
iScience
Published Date:
Apr 17, 2026
Abstract
Protein interactions with nucleic acids are fundamental to numerous biological processes. Here, we present PNABPred, a multi-modal framework that integrates biophysical and evolutionary priors into a protein language model to predict protein-nucleic acid interactions. Specifically, semantic representations are combined with classical evolutionary representations and biophysical representations. Our results showed that PNABPred outperforms other state-of-the-art sequence-based methods. In RNA-binding protein classification tasks, PNABPred achieved an Matthews correlation coefficient (MCC) of 0.889 and area under the receiver operating characteristic curve (AUROC) of 0.990, 17.44% and 3.23% higher than the next best method (Seq-RBPPred), respectively. In DNA-binding site prediction, PNABPred also outperformed the transformer-based method CLAPE-DB by 20.31% and 4.65% (MCC 0.468; AUROC 0.922) on the Test_129 dataset. PNABPred employs only protein sequences as inputs, identifying nucleic acid binding sites even in intrinsically disordered regions. This framework supports scalable sequence screening and annotation of nucleic acid-binding proteins for basic research, biotechnology, and therapeutic development applications.
Authors
Keywords
No keywords available for this article.