DeepDNAbP: A deep learning-based hybrid approach to improve the identification of deoxyribonucleic acid-binding proteins.

Journal: Computers in biology and medicine

Published Date: Jun 1, 2022

Abstract

Accurate identification of DNA-binding proteins (DBPs) is critical for both understanding protein function and drug design. DBPs also play essential roles in different kinds of biological activities such as DNA replication, repair, transcription, and splicing. As experimental identification of DBPs is time-consuming and sometimes biased toward prediction, constructing an effective DBP model represents an urgent need, and computational methods that can accurately predict potential DBPs based on sequence information are highly desirable. In this paper, a novel predictor called DeepDNAbP has been developed to accurately predict DBPs from sequences using a convolutional neural network (CNN) model. First, we perform three feature extraction methods, namely position-specific scoring matrix (PSSM), pseudo-amino acid composition (PseAAC) and tripeptide composition (TPC), to represent protein sequence patterns. Secondly, SHapley Additive exPlanations (SHAP) are employed to remove the redundant and irrelevant features for predicting DBPs. Finally, the best features are provided to the CNN classifier to construct the DeepDNAbP model for identifying DBPs. The final DeepDNAbP predictor achieves superior prediction performance in K-fold cross-validation tests and outperforms other existing predictors of DNA-protein binding methods. DeepDNAbP is poised to be a powerful computational resource for the prediction of DBPs. The web application and curated datasets in this study are freely available at: http://deepdbp.sblog360.blog/.

Authors

Md Faruk Hosen

Department of Information and Communication Technology, Mawlana Bhashani Science and Technology University, Santosh, Tangail, 1902, Bangladesh.
S M Hasan Mahmud

Department of Computer Science, American International University-Bangladesh (AIUB), Kuratoli, Dhaka, 1229, Bangladesh.
Kawsar Ahmed

Group of Biophotomatiχ, Department of Information and Communication Technology, Mawlana Bhashani Science and Technology University, Santosh, Tangail-1902, Bangladesh; Department of Information and Communication Technology, Mawlana Bhashani Science and Technology University, Santosh, Tangail, 1902, Bangladesh. Electronic address: kawsar.ict@mbstu.ac.bd.
Wenyu Chen

School of Computer Science and Engineering, University of Electronic Science and Technology of China, Sichuan, 611731, China. Electronic address: cwy@uestc.edu.cn.
Mohammad Ali Moni

Bone Biology Divisions, Garvan Institute of Medical Research, Sydney, NSW 2010, Australia; The University of Sydney, School of Medical Sciences, Faculty of Medicine & Health, NSW 2006, Australia. Electronic address: mohammad.moni@sydney.edu.au.
Hong-Wen Deng

Center for Bioinformatics and Genomics, Department of Global Biostatistics and Data Science, Tulane University, New Orleans, LA 70112, USA.
Watshara Shoombuatong

Center of Data Mining and Biomedical Informatics, Faculty of Medical Technology, Mahidol University, Bangkok 10700, Thailand.
Md Mehedi Hasan

Nutrition and Clinical Services Division, International Center for Diarrheal Disease and Research, Bangladesh (icddr,b), Dhaka, Bangladesh.

Keywords

Computational Biology Deep Learning DNA DNA-Binding Proteins Neural Networks, Computer Position-Specific Scoring Matrices

External Resources

View on PubMed Access via DOI PubMed (35378437)

DeepDNAbP: A deep learning-based hybrid approach to improve the identification of deoxyribonucleic acid-binding proteins.

Abstract

Authors

Keywords

External Resources

Popular Topics

Recent Journals