Harnessing DNA Foundation Models for Cross-Species Transcription Factor Binding Site Prediction in Plant Genomes

Journal: bioRxiv
Published Date:

Abstract

Accurate prediction of transcription factor binding sites (TFBSs) is crucial for understanding gene regulation. While experimental methods like ChIP-seq and DAP-seq are informative, they are labor-intensive and species-specific. Recent advancements in large-scale pretrained DNA foundation models have shown promise in overcoming these limitations. This study evaluates the performance of three such models—DNABERT-2, AgroNT, and HyenaDNA—in predicting TFBSs in plants. Using Arabidopsis thaliana and Sisymbrium irio DAP-seq data, we benchmark their accuracy against specialized methods like DeepBind and BERT-TFBS. Our results demonstrate that foundation models, particularly HyenaDNA, offer superior predictive accuracy and computational efficiency, highlighting their potential for scalable, genome-wide TFBS prediction in plants.

Authors

  • Maryam Haghani; Krishna Vamsi Dhulipalla; Song Li