Characterizing Highly Conserved Fragments in 3'UTRs via Computational and Transfer Learning Approaches

Journal: bioRxiv
Published Date:

Abstract

3' untranslated regions (3' UTRs) serve as regulatory platforms that modulate translation, mRNA localization, and stability through the binding of regulators, such as RNA-binding proteins (RBPs) and miRNAs, in a sequence-specific manner. These vital binding sites are often identified through orthologous regions among species. A separate but related discovery is the ultraconserved elements (UCEs) detected in human, rat, and mouse genomes two decades ago. However, our knowledge about their functions is limited. Perplexingly, alterations in UCEs in mouse embryos can still produce viable progeny with no observable phenotypic differences. The majority of UCEs are non-coding, though ~8% are located in the 3'UTRs. Given the importance of 3'UTRs in gene regulation, we use a computational approach to identify highly conserved fragments (CFs) in 3'UTRs across diverse mammals, applying criteria appropriate for 3'UTRs (>=50 bp and >=90% identity). Results show that they are not composed of simple repeats or low-complexity regions common to mammalian genomes. Using a transformer-based foundational genomic model, CFs are characterized as A and T-rich and distinguishable from the 3'UTR background. 36 human CFs from 25 genes are significantly depleted in variations in humans. They are enriched in neuronal tissues and play roles in neurodevelopment and RNA processing, mediated by RBPs and miRNAs. Our findings expand on existing studies that attribute UCEs primarily to enhancer function, suggesting a new path to explore the biological roles of UCEs in 3'UTRs.

Authors

  • Ho
  • E. S.; Baeck-Hubloux
  • A.; Dinh
  • N.; Severino
  • A.; Troy
  • C.

Categories