: Enhancer Discovery from Human to Fly via Interpretable Deep Learning.

Journal: bioRxiv : the preprint server for biology
Published Date:

Abstract

Enhancers are essential non-coding DNA elements that regulate gene expression, yet their accurate identification remains a major challenge. We introduce , a convolutional neural network-based framework for cross-species enhancer prediction that combines high accuracy with biological interpretability. Trained on human data, achieves strong performance across human, mouse, and fly datasets, consistently outperforming existing methods in precision and F1 score. It generalizes effectively to datasets generated using diverse experimental assays. An ensemble strategy further improves prediction reliability by reducing false positives - critical for genome-wide applications. supports fine-tuning on new species and retains strong performance even when adapted with as few as 20,000 enhancer sequences, making it ideal for newly sequenced genomes with limited experimental data. For interpretability and visualization, we apply class activation maps to identify sequence regions predictive of enhancer activity. Experimental validation in transgenic flies confirms the predictive power of : five of six tested candidates drove reporter expression, and four exhibited expression patterns supported by prior literature. These analyses highlight distinct sequence and contextual features that confer what we term enhancerness: enhancer sequences possess a characteristic, identifiable signature. Together, these findings position as a practical, accurate, and interpretable framework for enhancer discovery across diverse species.

Authors

  • Luis M Solis
  • Geyenna Sterling-Lentsch
  • Marc S Halfon
  • Hani Z Girgis

Keywords

No keywords available for this article.