CREsted: modeling genomic and synthetic cell-type-specific enhancers across tissues and species.

Journal: Nature methods
Published Date:

Abstract

Sequence-based deep learning models have become the state of the art for analyzing the genomic regulatory code. Particularly for enhancers, these models excel at deciphering sequence grammar that underlies their activity. To enable end-to-end enhancer modeling and design, we developed a software package called CREsted (cis-regulatory element sequence training, explanation and design). It combines preprocessing and analysis of single-cell assay for transposase-accessible chromatin using sequencing data, modeling chromatin accessibility from sequence, sequence design and downstream analysis to decipher enhancer grammar. We demonstrate CREsted's functionality on a mouse cortex and a human peripheral blood mononuclear cell dataset. Additionally, we use CREsted to compare mesenchymal-like cancer cell states between tumor types, and we investigate different fine-tuning strategies of genomic foundation models within CREsted. Finally, we train a model on a zebrafish development atlas and use this to design and in vivo validate cell-type-specific enhancers. For varying datasets, we demonstrate that CREsted facilitates efficient training and analyses, enabling scrutinization of the enhancer logic and design of synthetic enhancers across tissues and species.

Authors

Keywords

No keywords available for this article.