Pan-cell-type prediction of splicing patterns from sequence and splicing factor expression

Journal: bioRxiv
Published Date:

Abstract

Alternative splicing is a core determinant of cell-type-specific gene expression in humans, and its dysregulation contributes to many diseases including neurodegeneration, autoimmunity, and cancer. However, current deep learning models for predicting RNA expression from sequence are limited in how they handle cellular context dependence. These models typically achieve cell-type specificity by training separate models or heads for each tissue or cell-type, which assumes discrete, predefined cell types. This design not only prevents learning from pathological or experimentally perturbed transcriptomes, but also prevents generalization to new cellular contexts. Here we introduce PanExonNet, a deep learning framework that integrates cis- and trans-regulation by conditioning sequence-based splicing predictions on an inferred splicing state derived from the expression of RNA-binding proteins (RBPs) and spliceosome components. PanExonNet is trained on diploid, individual-specific gene sequences containing indels to predict splicing profiles derived from short-read RNA-seq, and it outputs multiple tracks at single-nucleotide resolution as well as donor-acceptor junction usage. Compared to multi-headed baselines such as Borzoi and Pangolin, PanExonNet exhibits substantially higher context specificity---as quantified by a DeltaPSI correlation metric designed to isolate context-specific variation---and, crucially, generalizes to unseen cell types. Adding perturbational knockdown followed by RNA sequencing (KD-RNA-seq) data to the training set further improves generalization. This performance is enabled in part by our introduction of contextualizable convolutions, a modular layer that may broadly benefit genomic sequence modeling. This framework provides a scalable foundation for future DNA-to-RNA models that could improve variant effect prediction, the design of oligonucleotide therapeutics, and biomarker discovery across diverse cellular contexts.

Authors

  • Vetsigian
  • K.; Lancaster
  • J.; Ieremie
  • I.; Radens
  • C. M.; Smyth
  • P.; Young
  • S.

Categories