ProSiteHunter: A Unified Framework for Sequence-Based Prediction of Protein-Nucleic Acid and Protein-Protein Binding Sites.

Journal: Advanced science (Weinheim, Baden-Wurttemberg, Germany)
Published Date:

Abstract

Accurate identification of protein binding sites is essential for elucidating protein function, decoding molecular recognition, and guiding drug design. However, existing sequence-based approaches are often designed for specific binding-site types and therefore lack generality, whereas structure-based methods typically rely on high-quality structural models, limiting their applicability. Here, we present ProSiteHunter, a unified sequence-based framework for predicting protein binding sites spanning protein-DNA, protein-RNA, protein-protein, and antibody-antigen interfaces. ProSiteHunter integrates the fine-tuned protein language model SiteT5 with evolutionary, geometric, and statistical features extracted from sequences. These representations are further processed through a Multi-Source Feature Fusion (MSFF) module, which captures bidirectional semantics, local associations, and global dependencies to achieve a comprehensive characterization of binding sites, thereby substantially improving predictive accuracy and generalization capability. Across comprehensive benchmarks, ProSiteHunter achieved a 38.4% average improvement in the area under the precision-recall curve (PRAUC) for protein-DNA/RNA/protein tasks and a 15.1% PRAUC enhancement on the particularly challenging antibody-antigen task over state-of-the-art methods. Moreover, ProSiteHunter is capable of identifying local flexible sites that complement AlphaFold3 predictions and improving the accuracy of antibody-antigen interaction prediction. These results highlight ProSiteHunter as an efficient and unified approach for accurate and robust prediction of diverse protein binding sites.

Authors

Keywords

No keywords available for this article.