DRBpred: A sequence-based machine learning method to effectively predict DNA- and RNA-binding residues.

Journal: Computers in biology and medicine
PMID:

Abstract

DNA-binding and RNA-binding proteins are essential to an organism's normal life cycle. These proteins have diverse functions in various biological processes. DNA-binding proteins are crucial for DNA replication, transcription, repair, packaging, and gene expression. Likewise, RNA-binding proteins are essential for the post-transcriptional control of RNAs and RNA metabolism. Identifying DNA- and RNA-binding residue is essential for biological research and understanding the pathogenesis of many diseases. However, most DNA-binding and RNA-binding proteins still need to be discovered. This research explored various properties of the protein sequences, such as amino acid composition type, Position-Specific Scoring Matrix (PSSM) values of amino acids, Hidden Markov model (HMM) profiles, physiochemical properties, structural properties, torsion angles, and disorder regions. We utilized a sliding window technique to extract more information from a target residue's neighbors. We proposed an optimized Light Gradient Boosting Machine (LightGBM) method, named DRBpred, to predict DNA-binding and RNA-binding residues from the protein sequence. DRBpred shows an improvement of 112.00 %, 33.33 %, and 6.49 % for the DNA-binding test set compared to the state-of-the-art method. It shows an improvement of 112.50 %, 16.67 %, and 7.46 % for the RNA-binding test set regarding Sensitivity, Mathews Correlation Coefficient (MCC), and AUC metric.

Authors

  • Md Wasi Ul Kabir
    Department of Computer Science, University of New Orleans, New Orleans, LA, USA.
  • Duaa Mohammad Alawad
    Department of Computer Science, University of New Orleans, New Orleans, LA, USA. Electronic address: dmalawad@uno.edu.
  • Pujan Pokhrel
    Department of Computer Science, University of New Orleans, New Orleans, LA, USA. Electronic address: ppokhre1@uno.edu.
  • Md Tamjidul Hoque
    Department of Computer Science, University of New Orleans, New Orleans, LA, United States of America.