SPACE: Your Genomic Profile Predictor is a Powerful DNA Foundation Model
Journal:
arXiv
Published Date:
Jun 2, 2025
Abstract
Inspired by the success of unsupervised pre-training paradigms, researchers
have applied these approaches to DNA pre-training. However, we argue that these
approaches alone yield suboptimal results because pure DNA sequences lack
sufficient information, since their functions are regulated by genomic profiles
like chromatin accessibility. Here, we demonstrate that supervised training for
genomic profile prediction serves as a more effective alternative to pure
sequence pre-training. Furthermore, considering the multi-species and
multi-profile nature of genomic profile prediction, we introduce our
$\textbf{S}$pecies-$\textbf{P}$rofile $\textbf{A}$daptive
$\textbf{C}$ollaborative $\textbf{E}$xperts (SPACE) that leverages Mixture of
Experts (MoE) to better capture the relationships between DNA sequences across
different species and genomic profiles, thereby learning more effective DNA
representations. Through extensive experiments across various tasks, our model
achieves state-of-the-art performance, establishing that DNA models trained
with supervised genomic profiles serve as powerful DNA representation learners.
The code is available at https://github.com/ZhuJiwei111/SPACE.