PPDiff: Diffusing in Hybrid Sequence-Structure Space for Protein-Protein Complex Design
Journal:
arXiv
Published Date:
Jun 13, 2025
Abstract
Designing protein-binding proteins with high affinity is critical in
biomedical research and biotechnology. Despite recent advancements targeting
specific proteins, the ability to create high-affinity binders for arbitrary
protein targets on demand, without extensive rounds of wet-lab testing, remains
a significant challenge. Here, we introduce PPDiff, a diffusion model to
jointly design the sequence and structure of binders for arbitrary protein
targets in a non-autoregressive manner. PPDiffbuilds upon our developed
Sequence Structure Interleaving Network with Causal attention layers (SSINC),
which integrates interleaved self-attention layers to capture global amino acid
correlations, k-nearest neighbor (kNN) equivariant graph layers to model local
interactions in three-dimensional (3D) space, and causal attention layers to
simplify the intricate interdependencies within the protein sequence. To assess
PPDiff, we curate PPBench, a general protein-protein complex dataset comprising
706,360 complexes from the Protein Data Bank (PDB). The model is pretrained on
PPBenchand finetuned on two real-world applications: target-protein mini-binder
complex design and antigen-antibody complex design. PPDiffconsistently
surpasses baseline methods, achieving success rates of 50.00%, 23.16%, and
16.89% for the pretraining task and the two downstream applications,
respectively.