DHUpredET: A comparative computational approach for identification of dihydrouridine modification sites in RNA sequence.

Journal: Analytical biochemistry
Published Date:

Abstract

Laboratory-based detection of D sites is laborious and expensive. In this study, we developed effective machine learning models employing efficient feature encoding methods to identify D sites. Initially, we explored various state-of-the-art feature encoding approaches and 30 machine learning techniques for each and selected the top eight models based on their independent testing and cross-validation outcomes. Finally, we developed DHUpredET using the extra tree classifier methods for predicting DHU sites. The DHUpredET model demonstrated balanced performance across all evaluation criteria, outperforming state-of-the-art models by 8 % and 14 % in terms of accuracy and sensitivity, respectively, on an independent test set. Further analysis revealed that the model achieved higher accuracy with position-specific two nucleotide (PS2) features, leading us to conclude that PS2 features are the best suited for the DHUpredET model. Therefore, our proposed model emerges as the most favorite choice for predicting D sites. In addition, we conducted an in-depth analysis of local features and identified a particularly significant attribute with a feature score of 0.035 for PS2_299 attributes. This tool holds immense promise as an advantageous instrument for accelerating the discovery of D modification sites, which contributes too many targeting therapeutic and understanding RNA structure.

Authors

  • Md Fahim Sultan
    Department of Computer Science and Engineering, Daffodil International University, Daffodil Smart City, Birulia, Dhaka, 1216, Bangladesh.
  • Tasmin Karim
    Department of Computer Science and Engineering, Daffodil International University, Daffodil Smart City, Birulia, Dhaka, 1216, Bangladesh.
  • Md Shazzad Hossain Shaon
    Department of Computer Science and Engineering, Oakland University, Rochester, MI, 48309, USA. Electronic address: shaon@oakland.edu.
  • Sayed Mehedi Azim
    Department of Computer Science and Engineering, United International University, Plot-2, United City, Madani Avenue, Badda, Dhaka, 1212, Bangladesh.
  • Iman Dehzangi
    Department of Computer Science, Rutgers University, Camden, NJ, United States.
  • Mst Shapna Akter
    Department of Electrical and Computer Engineering, North South University, Bashundhara, Dhaka, Bangladesh.
  • Sobhy M Ibrahim
    Department of Biochemistry, College of Science, King Saud University, P.O. Box: 2455, Riyadh, 11451, Saudi Arabia. Electronic address: syakout@ksu.edu.sa.
  • Md Mamun Ali
    Department of Software Engineering (SWE), Daffodil International University (DIU), Sukrabad, Dhaka, 1207, Bangladesh.
  • Kawsar Ahmed
    Group of Biophotomatiχ, Department of Information and Communication Technology, Mawlana Bhashani Science and Technology University, Santosh, Tangail-1902, Bangladesh; Department of Information and Communication Technology, Mawlana Bhashani Science and Technology University, Santosh, Tangail, 1902, Bangladesh. Electronic address: kawsar.ict@mbstu.ac.bd.
  • Francis M Bui
    Department of Electrical and Computer Engineering, University of Saskatchewan, 57 Campus Drive, Saskatoon, SK S7N 5A9, Canada.