Diffusion Model with Representation Alignment for Protein Inverse Folding
Journal:
arXiv
Published Date:
Dec 12, 2024
Abstract
Protein inverse folding is a fundamental problem in bioinformatics, aiming to
recover the amino acid sequences from a given protein backbone structure.
Despite the success of existing methods, they struggle to fully capture the
intricate inter-residue relationships critical for accurate sequence
prediction. We propose a novel method that leverages diffusion models with
representation alignment (DMRA), which enhances diffusion-based inverse folding
by (1) proposing a shared center that aggregates contextual information from
the entire protein structure and selectively distributes it to each residue;
and (2) aligning noisy hidden representations with clean semantic
representations during the denoising process. This is achieved by predefined
semantic representations for amino acid types and a representation alignment
method that utilizes type embeddings as semantic feedback to normalize each
residue. In experiments, we conduct extensive evaluations on the CATH4.2
dataset to demonstrate that DMRA outperforms leading methods, achieving
state-of-the-art performance and exhibiting strong generalization capabilities
on the TS50 and TS500 datasets.