EDEN: multiscale expected density of nucleotide encoding for enhanced DNA sequence classification with hybrid deep learning.

Journal: BMC bioinformatics
Published Date:

Abstract

BACKGROUND: DNA sequences are fundamental carriers of genetic information, and their accurate classification is essential for understanding gene regulation, disease mechanisms, and translational genomics. Existing encoding methods often fail to capture both local and long-range dependencies simultaneously. RESULTS: We introduce EDEN (Expected Density of Nucleotide Encoding), a unified multiscale encoding framework based on kernel density estimation. EDEN captures position-specific and context-dependent nucleotide patterns and integrates them into a hybrid deep learning architecture. Across sixteen benchmark datasets covering promoter detection, core promoter detection, and transcription factor binding prediction, EDEN achieves the best average performance while using orders of magnitude fewer parameters compared with state-of-the-art models. All source code, pretrained models, and datasets are publicly available at: https://github.com/zabihis/EDEN. CONCLUSIONS: EDEN provides an efficient, biologically informed, and interpretable multiscale representation for genomic sequence classification. Its favorable parameter-performance ratio and robust consistency across tasks underscore its practicality for large-scale genomic applications.

Authors

Keywords

No keywords available for this article.