CircFormerMoE: An End-to-End Deep Learning Framework for Circular RNA Splice Site Detection and Pairing in Plant Genomes
Journal:
arXiv
Published Date:
Jul 11, 2025
Abstract
Circular RNAs (circRNAs) are important components of the non-coding RNA
regulatory network. Previous circRNA identification primarily relies on
high-throughput RNA sequencing (RNA-seq) data combined with alignment-based
algorithms that detect back-splicing signals. However, these methods face
several limitations: they can't predict circRNAs directly from genomic DNA
sequences and relies heavily on RNA experimental data; they involve high
computational costs due to complex alignment and filtering steps; and they are
inefficient for large-scale or genome-wide circRNA prediction. The challenge is
even greater in plants, where plant circRNA splice sites often lack the
canonical GT-AG motif seen in human mRNA splicing, and no efficient deep
learning model with strong generalization capability currently exists.
Furthermore, the number of currently identified plant circRNAs is likely far
lower than their true abundance. In this paper, we propose a deep learning
framework named CircFormerMoE based on transformers and mixture-of experts for
predicting circRNAs directly from plant genomic DNA. Our framework consists of
two subtasks known as splicing site detection (SSD) and splicing site pairing
(SSP). The model's effectiveness has been validated on gene data of 10 plant
species. Trained on known circRNA instances, it is also capable of discovering
previously unannotated circRNAs. In addition, we performed interpretability
analyses on the trained model to investigate the sequence patterns contributing
to its predictions. Our framework provides a fast and accurate computational
method and tool for large-scale circRNA discovery in plants, laying a
foundation for future research in plant functional genomics and non-coding RNA
annotation.