RNAGenesis: A Generalist Foundation Model for Functional RNA Therapeutics
Journal:
bioRxiv
Published Date:
Jan 1, 2025
Abstract
RNA molecules are central to gene regulation, catalysis, and molecular recognition, and offer broad opportunities for therapeutic applications. However, uncovering their complex sequence, structure, and function relationships, particularly for non-coding RNAs, remains a formidable challenge. Here, we introduce RNAGenesis, a Generalist RNA foundation model that integrates sequence representation, structural prediction, and de novo functional design within a single generative framework. Trained on diverse clustered non-coding RNAs, RNAGenesis leverages a BERT-style encoder, query-based latent compression, and a diffusion-guided decoder enhanced by inference-time alignment with gradient guidance and beam search strategies. Through comprehensive evaluations, RNAGenesis achieves state-of-the-art performance on 11 of 13 tasks in the BEACON benchmark and surpasses structure-aware models in inverse folding, 3D structure prediction, and de novo structure design. We further introduce RNATx-Bench, a dedicated benchmark for RNA therapeutics comprising over 100,000 experimentally validated sequences. RNAGenesis demonstrates strong predictive performance across ASOs, siRNAs, shRNAs, circRNAs, and untranslated region (UTR) variants. Furthermore, RNAGenesis enables functional RNA design, including aptamers targeting IGFBP3 and structurally constrained sgRNA scaffolds. Wet-lab validation confirms aptamer binding with KD values as low as 4.02 nM and up to 2.5-fold improvement in editing efficiency across CRISPR-Cas9, base editing, and prime editing systems. These results position RNAGenesis as a next-generation general-purpose RNA foundation model with broad utility for computational modeling and experimental therapeutic design.