OmniGenBench: A Modular Platform for Reproducible Genomic Foundation Models Benchmarking
Journal:
arXiv
Published Date:
May 20, 2025
Abstract
The code of nature, embedded in DNA and RNA genomes since the origin of life,
holds immense potential to impact both humans and ecosystems through genome
modeling. Genomic Foundation Models (GFMs) have emerged as a transformative
approach to decoding the genome. As GFMs scale up and reshape the landscape of
AI-driven genomics, the field faces an urgent need for rigorous and
reproducible evaluation. We present OmniGenBench, a modular benchmarking
platform designed to unify the data, model, benchmarking, and interpretability
layers across GFMs. OmniGenBench enables standardized, one-command evaluation
of any GFM across five benchmark suites, with seamless integration of over 31
open-source models. Through automated pipelines and community-extensible
features, the platform addresses critical reproducibility challenges, including
data transparency, model interoperability, benchmark fragmentation, and
black-box interpretability. OmniGenBench aims to serve as foundational
infrastructure for reproducible genomic AI research, accelerating trustworthy
discovery and collaborative innovation in the era of genome-scale modeling.