Fine scale structural information substantially improves multivariate regression model for mRNA in-vial degradation prediction

Journal: bioRxiv
Published Date:

Abstract

The success of COVID-19 mRNA vaccines has made the in-solution stability optimization of mRNAs a key objective. However, we still lack a complete understanding of sequence metrics that influence mRNA in-solution stability. RNA secondary structure plays a critical role in protecting against hydrolysis, the primary degradation pathway under storage conditions. Yet, the structural metrics that best guide stability-focused mRNA design remain incompletely defined. Global metrics like minimum free energy and average unpaired probability have improved mRNA stability but fail to capture local structural variation relevant to RNA degradation. We demonstrate that base-pairing log odds provide fine-scale, orthogonal insight that complements global metrics and improves stability modeling. Further, by combining local and global features into a parsimonious four-feature regression model, dubbed Stability Regression Analysis using Nucleotide-Derived features (STRAND), we achieve a greater than two-fold reduction in prediction error compared to existing machine learning and deep learning approaches and demonstrate robust generalization across diverse transcript contexts. This compact and interpretable model provides an accurate and reliable framework for predicting mRNAs in-solution stability.

Authors

  • Yi
  • S.; Ali
  • S.; Jadeja
  • Y.; Davis
  • J. W.; Metkar
  • M.

Categories