RNAElectra: An ELECTRA-Style RNA Foundation Model for RNA Regulatory Inference
Journal:
bioRxiv
Published Date:
Mar 17, 2026
Abstract
RNA regulation governs gene expression through sequence-encoded mechanisms such as RNA structure formation, protein binding, chemical modification, and RNA-RNA targeting, with regulatory rules spanning both nucleotide-scale motifs and longer-range context. RNA foundation models seek to learn transferable representations from large RNA corpora, but most existing approaches rely on masked language modeling (MLM), in which the training loss is computed on only a subset of positions and the model is pretrained on artificially corrupted inputs that do not appear during downstream inference, creating a mismatch between pretraining and fine-tuning. Here, we introduce RNAElectra, a single-nucleotide-resolution RNA foundation model pretrained on diverse non-coding RNAs from RNAcentral using ELECTRA-style replaced-token detection (RTD). Unlike MLM, RTD trains a discriminator with a loss defined over all input positions on realistically corrupted sequences, providing dense supervision and better alignment with downstream sequence-to-function prediction tasks. RNAElectra combines nucleotide-resolution tokenization with an efficient attention design to capture both local regulatory motifs and longer-range dependencies within a single reusable backbone. Using a unified, sequence-only fine-tuning pipeline without task-specific architectures or auxiliary inputs, RNAElectra shows strong cross-task generalization across benchmarks spanning RNA structure and function, RNA-protein and RNA-RNA interactions, RNA modification landscapes, and quantitative regulatory readouts such as translation efficiency and mRNA stability, outperforming widely used RNA foundation model baselines on most evaluated tasks. Beyond predictive accuracy, RNAElectra also supports interpretability by enabling analysis of learned representations and sequence determinants underlying model predictions. Together, these results establish RTD pretraining as a practical alternative to MLM for RNA foundation modeling and position RNAElectra as a reusable backbone for RNA regulatory prediction, sequence-level RNA engineering, and design.