CREP: Cis-Regulatory Element Predictor Based on Fine-Tuned Enformer

Journal: bioRxiv
Published Date:

Abstract

A substantial fraction of disease-associated genetic variants reside in non-coding regions of the genome, where they act by perturbing cis-regulatory elements (CREs) such as enhancers, promoters, and insulators. While recent sequence-based deep learning models, such as Enformer, accurately predict continuous epigenomic signals from DNA sequence, they do not directly provide discrete and interpretable CRE annotations. Here, we present CREP (Cis-Regulatory Element Predictor), a fine-tuned version of Enformer trained to predict regulatory element identity from sequence using REgulamentary-derived annotations across multiple human cell-types. Through a controlled experimental framework, we show that incorporating diverse cell-types improves model performance. CREP leverages cell-type-specific training data to learn regulatory representations while producing a unified prediction of CRE identity from sequence. This is demonstrated by the Vanuatu SNP, a non-coding variant that creates a de novo erythroid regulatory element, which is correctly detected only when erythroid data are included during training. Error analysis further reveals that apparent misclassifications between enhancers and promoters reflect their shared regulatory architecture, supporting the view of CREs as a functional continuum rather than strictly discrete classes. Together, these results demonstrate that CREP enables interpretable prediction of regulatory element identity from sequence and provides a framework for the functional interpretation of non-coding genetic variation.

Authors

  • Stranieri
  • N.; Riva
  • S. G.; Hughes
  • J. R.

Categories