PLM-DBPs: enhancing plant DNA-binding protein prediction by integrating sequence-based and structure-aware protein language models.

Journal: Briefings in bioinformatics
Published Date:

Abstract

DNA-binding proteins (DBPs) play a crucial role in gene regulation, development, and environmental responses across plants, animals, and microorganisms. Existing DBP prediction methods are largely limited to sequence information, whether through handcrafted features or sequence-based protein language models (PLMs), overlooking structural cues critical to protein function. In addition, most existing tools are trained for general DBP predictions, which are often not accurate for plant-specific DBPs due to the unique structural and functional properties of plant proteins. Our work introduces PLM-DBPs, a deep learning framework that integrates both sequence-based and structure-aware representations to enhance DBP prediction in plants. We evaluated several state-of-the-art PLMs to extract high-dimensional protein representations and experimented with various fusion strategies to validate the complementary information between the various representations. Our final model, a fusion of sequence-based and structure-aware ANN models, achieves a notable improvement in predicting DBPs in plants outperforming previous state-of-the-art models. Although sequence-based PLMs already demonstrate strong performance in DBP prediction, our findings show that the integration of structural information further enhances predictive accuracy. This underscores the complementary nature of structural representations and establishes PLM-DBPs as a robust tool for advancing plant research and agricultural innovation. The proposed model and other resources are publicly available at https://github.com/suresh-pokharel/PLM-DBPs.

Authors

  • Suresh Pokharel
    Department of Computer Science, College of Computing, Michigan Technological University, Houghton, MI, USA.
  • Kepha Barasa
    College of Computing, Michigan Technological University, Houghton 49931, MI, United States.
  • Pawel Pratyush
    Department of Computer Science, Michigan Technological University, Houghton, Michigan 49931, United States.
  • Dukka B Kc
    Department of Computational Science and Engineering, North Carolina A&T State University, Greensboro, NC, 27411, USA. dbkc@ncat.edu.