The Advances in Deep Learning Modeling of Polyadenylation Codes.
Journal:
Wiley interdisciplinary reviews. RNA
Published Date:
Jan 1, 2025
Abstract
3'-end cleavage and polyadenylation is an essential step of eukaryotic mRNA and lncRNA expression. The formation of a polyadenylation (polyA) site is determined by combinatory effects of multiple tandem motifs (~6 motifs in humans), each of which is bound by a protein subcomplex. However, motif occurrences and compositions are quite variable across individual polyA sites, leading to the technical challenge of quantifying polyadenylation activities and defining cleavage sites. Although conventional motif enrichment analyses and machine learning models identified contributing polyadenylation motifs, these cannot unbiasedly quantify motif crosstalk. Recently, several groups developed deep learning models to resolve sequence complexity, capture complex positional interactions among cis-regulatory motifs, examine polyA site formation, predict cleavage probability, and calculate site strength. These deep learning models have brought novel insights into polyadenylation biology, such as site configuration differences across species, cleavage heterogeneity, genomic parameters regulating site expression, and human genetic variants altering polyadenylation activities. In this review, we summarize the advances of deep learning models developed to address facets of polyadenylation regulation and discuss applications of the models.