Ligand-guided Sequence-structure Co-design of De Novo Functional Enzymes
Journal:
bioRxiv
Published Date:
Mar 4, 2026
Abstract
Proteins underpin essential biological functions across all kingdoms of life. The capacity to design novel proteins with tailored activities holds transformative potential for biotechnology, medicine, and sustainability. However, since protein functions, particularly enzymatic activities, depend on precise interactions with small-molecule ligands, accurately modeling these interactions remains a formidable challenge in de novo protein design. Here, we present ProteinNet, a large-scale generative model for the simultaneous co-design of protein sequence and structure with ligand-guided functional targeting. ProteinNet integrates Transformer-based sequence modeling with equivariant graph neural networks for 3D structural representation and an explicit protein-ligand interaction module. This model architecture enables targeted design from heterogeneous inputs, including binding ligands, functionally important residues, and evolutionary taxonomy. ProteinNet is a 730-million-parameter foundation model trained on 720,993 protein-ligand complexes using multi-task learning objectives, encompassing sequence, structure, and protein-ligand interaction prediction. In rigorous in silico benchmarks, ProteinNet consistently outperforms state-of-the-art baselines, including Inpainting, RFdiffusion/ProteinMPNN, and AlphaFold3/LigandMPNN, as measured by enzyme--substrate prediction scores, AlphaFold2 confidence metrics, and structural fidelity. We further experimentally validated \model across multiple enzyme families, including chloramphenicol acetyltransferase, aminoglycoside adenylyltransferase, and thiopurine S-methyltransferase. De novo enzymes generated by our fine-tuned ProteinNet exhibited catalytic activities comparable to or exceeding those of naturally enzymes, while retaining substantial novelty with sequence identities as low as 51.6%. These results establish ProteinNet as a robust Artificial Intelligence-based platform for functional enzyme design, demonstrating the power of large protein foundation models to create high-performance, novel biocatalysts.