Evaluating the representational power of pre-trained DNA language models for regulatory genomics.

Journal: Genome biology

Published Date: Jul 14, 2025

Abstract

BACKGROUND: The emergence of genomic language models (gLMs) offers an unsupervised approach to learning a wide diversity of cis-regulatory patterns in the non-coding genome without requiring labels of functional activity generated by wet-lab experiments. Previous evaluations have shown that pre-trained gLMs can be leveraged to improve predictive performance across a broad range of regulatory genomics tasks, albeit using relatively simple benchmark datasets and baseline models. Since the gLMs in these studies were tested upon fine-tuning their weights for each downstream task, determining whether gLM representations embody a foundational understanding of cis-regulatory biology remains an open question.

Authors

Ziqi Tang

Department of Pharmaceutical Chemistry, Department of Bioengineering and Therapeutic Sciences, Institute for Neurodegenerative Diseases, and Bakar Computational Health Sciences Institute, University of California, San Francisco, 675 Nelson Rising Ln Box 0518, San Francisco, CA, 94143, USA.
Nirali Somia

Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, USA.
Yiyang Yu

LPSM, Université de Paris, France.
Peter K Koo

Howard Hughes Medical Institute, Department of Molecular and Cellular Biology, Harvard University, Cambridge, MA 02138, United States.

Keywords

Genomics Humans Models, Genetic Regulatory Sequences, Nucleic Acid

External Resources

View on PubMed Access via DOI PubMed (40660356)

Evaluating the representational power of pre-trained DNA language models for regulatory genomics.

Abstract

Authors

Keywords

External Resources

Popular Topics

Recent Journals

Evaluating the representational power of pre-trained DNA language models for regulatory genomics.

Abstract

Authors

Keywords

External Resources

Stay Ahead of Medical AI

Popular Topics

Recent Journals