Probabilistic Modelling of Prime Editing Variant Correction Efficiency
Journal:
bioRxiv
Published Date:
Jan 1, 2025
Abstract
Prime editing has emerged as a versatile genome editing, technology capable of installing precise genetic modifications without requiring double-strand breaks or donor templates. However, designing pegRNAs with high editing efficiency remains a challenge due to the complex interplay of sequence features that affect editing outcomes. Current approaches predominantly provide point predictions without capturing the inherent uncertainty in editing efficiency, limiting risk assessment, and decision-making in pegRNA design. Here, we present crispAIPE, a transformer-based probabilistic framework for predicting prime editing variant correction efficiency with uncertainty quantification. Our approach models editing outcomes in a 3D simplex space, enabling comprehensive uncertainty estimation while achieving superior predictive performance compared to existing models. crispAIPE leverages transformer encoders to capture long-range sequence dependencies and contextual relationships, surpassing existing models in the point estimate prediction task. The model also predicts efficiency distributions for all edit types, including single nucleotide replacements, insertions, and deletions. Trained on 73, 939 pegRNAs in multiple cell lines, for all outcome types, on overall, crispAIPE achieves a Spearman correlation of 0.881 and a Pearson correlation of 0.894 while providing calibrated uncertainty estimates that allow the selection of risk-sensitive pegRNA. Additionally, we identify key sequence motifs and positional features that drive editing efficiency, providing interpretable insights into the sequence determinants of prime editing. We demonstrate that uncertainty-aware predictions could significantly improve pegRNA design outcomes, with high-confidence predictions showing higher success rates compared to low-confidence designs. crispAIPE represents the first probabilistic deep learning framework for prime editing, bridging the gap between predictive accuracy and uncertainty quantification to enable more reliable and interpretable pegRNA design. The source code and example data are available at https://github.com/furkanozdenn/pe-uncert.