Prediction and analysis of prokaryotic promoters based on sequence features.
Journal:
Bio Systems
PMID:
32755610
Abstract
Promoter recognition is an important part of functional genomic annotation but a difficult problem. Many studies have been carried out to address this issue. However, they still cannot meet application needs. Most of the methods exhibit specificity, and the objects analyzed are relatively simple, especially for prokaryotes. Hence, more research on prokaryotic promoters is lacking. In this study, the similarity between gene expression and the transmission of information inspired us to analyze promoter sequences by calculating the information content of the sequences and the correlation between sequences in the subregion. We also calculated other sequence features as supplements, such as the Hurst exponent, GC content, and sequence bending property. Then, we employed an artificial neural network to build a classifier and applied it to identify promoters in three organisms, Escherichia coli, Bacillus subtilis, and Pseudomonas aeruginosa. The experiments on the benchmark test set indicate that our method has good capability to distinguish promoters from randomly selected nonpromoters. The maximal AUC for the classifier is 0.90, and the minimal AUC score is 0.80. Additionally, cross-species experiments were conducted. The AUC of the cross-experiment on three organisms yielded 0.8, suggesting that our approach has better generalization ability, which is conducive to revealing the more common characteristics of prokaryotic promoters.