A Finetuning Deep Learning Framework for Pan-species Promoters with Pseudo Time Series Analysis on Time and Frequency Space.
Journal:
IEEE journal of biomedical and health informatics
Published Date:
May 9, 2025
Abstract
Promoter identification and classification play crucial roles in unraveling gene mechanisms. Promoters are characterized by specific motifs, such as the TATA-box for eukaryotes and the Pribnow box for prokaryotes, which are known as elements. These constitute the core components, intimately tied to promoter function. However, the heterogeneity of promoters across different species poses a significant challenge to improving identification models. In our study, we introduce ProTriCNN, a deep learning method designed for promoter identification. Based on promoters representation, ProTriCNN treats promoters as pseudo-time series, utilizing this approach to capture the intricate heterogeneity of promoter elements. Furthermore, we introduce TransPro, a ProTriCNN-based Fine-tuning framework to improve identification performance across different species. To better align source species and target species, the TransPro utilizes elements and species evolutionary trees to represent the locality difference between source and target species across various levels and time-frequency space, respectively. Compared to state-of-the-art methods, ProTriCNN demonstrates superior performance across all species, achieving an average accuracy improvement of 2.1% and a 20% enhancement in the Matthews coefficient. TransPro further attains accuracy improvement of the highest 8% and a 25% enhancement in the Matthews coefficient compared to ProTriCNN. The source code and the associated datasets are freely available at https://github.com/Limomo33/promoter.
Authors
Keywords
No keywords available for this article.