Prot42: a Novel Family of Protein Language Models for Target-aware Protein Binder Generation
Journal:
arXiv
Published Date:
Apr 6, 2025
Abstract
Unlocking the next generation of biotechnology and therapeutic innovation
demands overcoming the inherent complexity and resource-intensity of
conventional protein engineering methods. Recent GenAI-powered computational
techniques often rely on the availability of the target protein's 3D structures
and specific binding sites to generate high-affinity binders, constraints
exhibited by models such as AlphaProteo and RFdiffusion. In this work, we
explore the use of Protein Language Models (pLMs) for high-affinity binder
generation. We introduce Prot42, a novel family of Protein Language Models
(pLMs) pretrained on vast amounts of unlabeled protein sequences. By capturing
deep evolutionary, structural, and functional insights through an advanced
auto-regressive, decoder-only architecture inspired by breakthroughs in natural
language processing, Prot42 dramatically expands the capabilities of
computational protein design based on language only. Remarkably, our models
handle sequences up to 8,192 amino acids, significantly surpassing standard
limitations and enabling precise modeling of large proteins and complex
multi-domain sequences. Demonstrating powerful practical applications, Prot42
excels in generating high-affinity protein binders and sequence-specific
DNA-binding proteins. Our innovative models are publicly available, offering
the scientific community an efficient and precise computational toolkit for
rapid protein engineering.