DRBP-EDP: classification of DNA-binding proteins and RNA-binding proteins using ESM-2 and dual-path neural network.

Journal: NAR genomics and bioinformatics
Published Date:

Abstract

Regulation of DNA or RNA at the transcriptional, post-transcriptional, and translational levels are key steps in the central dogma of molecular biology. DNA-binding proteins (DBPs) and RNA-binding proteins (RBPs) play pivotal roles in the precise regulation of gene expression in these steps. Both of these two classes of proteins are nucleic acid-binding proteins (NABPs), so they exhibit significant similarity in both sequence and structure. However, traditional methods for identifying NABPs are typically time-consuming, costly, and challenging to scale up. Utilizing deep learning to classify proteins intelligently has emerged as a more efficient solution for these issues. In this study, we propose a phased classification method integrating ESM-2 with a dual-path neural network, called DRBP-EDP. Additionally, a refined approach to dataset construction is designed, resulting in the creation of high-quality protein classification datasets. The results demonstrated that the model achieved strong performance, with 90.03% accuracy in the first stage for classifying NABPs and non-nucleic acid-binding proteins, and 89.56% accuracy in the second stage for classifying DBPs and RBPs. To enhance accessibility and usability, DRBP-EDP has been developed in both executable and web-based versions, which are publicly available at https://doi.org/10.5281/zenodo.14092184 and https://github.com/MuQiang-MQ/DRBP-EDP.

Authors

  • Qiang Mu
    Department of Breast Surgery, Qingdao Central Hospital, University of Health and Rehabilitation Sciences, Qingdao, China.
  • Guoping Yu
    National Nanfan Research Institute (Sanya), Chinese Academy of Agricultural Sciences/Hainan Seed Industry Laboratory, Sanya 572024, China.
  • Guomin Zhou
    School of Medicine, Shanghai University, Shanghai, China.
  • Yubing He
    National Nanfan Research Institute (Sanya), Chinese Academy of Agricultural Sciences/Hainan Seed Industry Laboratory, Sanya 572024, China.
  • Jianhua Zhang