ProkDBP: Toward more precise identification of prokaryotic DNA binding proteins.

Journal: Protein science : a publication of the Protein Society

PMID: 38747369

Abstract

Prokaryotic DNA binding proteins (DBPs) play pivotal roles in governing gene regulation, DNA replication, and various cellular functions. Accurate computational models for predicting prokaryotic DBPs hold immense promise in accelerating the discovery of novel proteins, fostering a deeper understanding of prokaryotic biology, and facilitating the development of therapeutics targeting for potential disease interventions. However, existing generic prediction models often exhibit lower accuracy in predicting prokaryotic DBPs. To address this gap, we introduce ProkDBP, a novel machine learning-driven computational model for prediction of prokaryotic DBPs. For prediction, a total of nine shallow learning algorithms and five deep learning models were utilized, with the shallow learning models demonstrating higher performance metrics compared to their deep learning counterparts. The light gradient boosting machine (LGBM), coupled with evolutionarily significant features selected via random forest variable importance measure (RF-VIM) yielded the highest five-fold cross-validation accuracy. The model achieved the highest auROC (0.9534) and auPRC (0.9575) among the 14 machine learning models evaluated. Additionally, ProkDBP demonstrated substantial performance with an independent dataset, exhibiting higher values of auROC (0.9332) and auPRC (0.9371). Notably, when benchmarked against several cutting-edge existing models, ProkDBP showcased superior predictive accuracy. Furthermore, to promote accessibility and usability, ProkDBP (https://iasri-sg.icar.gov.in/prokdbp/) is available as an online prediction tool, enabling free access to interested users. This tool stands as a significant contribution, enhancing the repertoire of resources for accurate and efficient prediction of prokaryotic DBPs.

Authors

Upendra Kumar Pradhan

Studio of Computational Biology & Bioinformatics, The Himalayan Centre for High-throughput Computational Biology, (HiCHiCoB, A BIC supported by DBT, India)CSIR-Institute of Himalayan Bioresource Technology (CSIR-IHBT), Palampur (HP), India.
Prabina Kumar Meher

Division of Statistical Genetics, ICAR-Indian Agricultural Statistics Research Institute, New Delhi-110012, India.
Sanchita Naha

Division of Computer Applications, ICAR-Indian Agricultural Statistics Research Institute, PUSA, New Delhi 110012, India. Electronic address: sanchita.naha@icar.gov.in.
Ritwika Das

Division of Agricultural Bioinformatics, ICAR-Indian Agricultural Statistics Research Institute, PUSA, New Delhi, India.
Ajit Gupta

Division of Statistical Genetics, ICAR-Indian Agricultural Statistics Research Institute, PUSA, New Delhi 110012, India. Electronic address: ajit@icar.gov.in.
Rajender Parsad

ICAR-Indian Agricultural Statistics Research Institute, PUSA, New Delhi 110012, India. Electronic address: rajender.parsad@icar.gov.in.

Keywords

Algorithms Bacterial Proteins Computational Biology DNA-Binding Proteins Machine Learning

External Resources

View on PubMed Access via DOI PubMed (38747369)

ProkDBP: Toward more precise identification of prokaryotic DNA binding proteins.

Abstract

Authors

Keywords

External Resources

Popular Topics

Recent Journals