Variant-resolved prediction of context-specific isoform variation with a graph-based attention model.

Journal: Cell genomics
Published Date:

Abstract

In eukaryotes, most genes produce multiple transcript isoforms that diversify the transcriptome and proteome, serving as a key mechanism of functional regulation. Genetic variation can disrupt the RNA processing signals that shape isoform structure and abundance, yet modeling these effects at full-length isoform resolution remains challenging due to the complexity of transcript regulation. Here, we introduce Otari, an attention-based graph neural network framework trained on the human genomic sequence and long-read transcriptomes across 30 tissue types and brain regions. Otari predicts tissue-specific differential isoform abundance by integrating sequence-derived epigenetic and post-transcriptional signals, enabling isoform-resolved variant effect interpretation. Applied to large-scale variant datasets, including an autism cohort, Otari uncovers patterns of isoform dysregulation undetectable at the gene level, such as variant-driven perturbations in isoform abundance and microexon usage implicated in autism pathophysiology. We provide Otari as a resource for powering isoform-level analyses across tissues at scale.

Authors

  • Aviya Litman
    Quantitative and Computational Biology Program, Princeton University, Princeton, NJ 08540, USA; Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, NJ 08540, USA.
  • Zhicheng Pan
    Bioinformatics Interdepartmental Graduate Program, University of California, Los Angeles, Los Angeles, CA, USA.
  • Ksenia Sokolova
    Department of Computer Science and Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, New Jersey, USA; email: [email protected], [email protected], [email protected].
  • Joyce Fang
    Quantitative and Computational Biology Program, Princeton University, Princeton, NJ 08540, USA; Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, NJ 08540, USA.
  • Tess Marvin
    Quantitative and Computational Biology Program, Princeton University, Princeton, NJ 08540, USA; Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, NJ 08540, USA.
  • Natalie Sauerwald
    Center for Computational Biology, Flatiron Institute, New York, NY 10010, USA.
  • Christopher Y Park
    Flatiron Institute, Simons Foundation, New York, NY, USA.
  • Chandra L Theesfeld
    Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, NJ, USA.
  • Olga G Troyanskaya
    Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, NJ, USA. [email protected].

Keywords

No keywords available for this article.