ILGBMSH: an interpretable classification model for the shRNA target prediction with ensemble learning algorithm.

Journal: Briefings in bioinformatics
Published Date:

Abstract

Short hairpin RNA (shRNA)-mediated gene silencing is an important technology to achieve RNA interference, in which the design of potent and reliable shRNA molecules plays a crucial role. However, efficient shRNA target selection through biological technology is expensive and time consuming. Hence, it is crucial to develop a more precise and efficient computational method to design potent and reliable shRNA molecules. In this work, we present an interpretable classification model for the shRNA target prediction using the Light Gradient Boosting Machine algorithm called ILGBMSH. Rather than utilizing only the shRNA sequence feature, we extracted 554 biological and deep learning features, which were not considered in previous shRNA prediction research. We evaluated the performance of our model compared with the state-of-the-art shRNA target prediction models. Besides, we investigated the feature explanation from the model's parameters and interpretable method called Shapley Additive Explanations, which provided us with biological insights from the model. We used independent shRNA experiment data from other resources to prove the predictive ability and robustness of our model. Finally, we used our model to design the miR30-shRNA sequences and conducted a gene knockdown experiment. The experimental result was perfectly in correspondence with our expectation with a Pearson's coefficient correlation of 0.985. In summary, the ILGBMSH model can achieve state-of-the-art shRNA prediction performance and give biological insights from the machine learning model parameters.

Authors

  • Chengkui Zhao
    College of Intelligent Systems Science and Engineering, Harbin Engineering University, Harbin, China.
  • Nan Xu
    Department of Computer Science, University of Southern California, Los Angeles, CA, 90089, USA.
  • Jingwen Tan
    Department of Radiology, Ruijin Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, PR China.
  • Qi Cheng
    Institute of Intelligent System and Bioinformatics, College of Intelligent Systems Science and Engineering, Harbin Engineering University, Harbin, China.
  • Weixin Xie
    Institute of Intelligent System and Bioinformatics, College of Intelligent Systems Science and Engineering, Harbin Engineering University, Harbin, China.
  • Jiayu Xu
    College of Computer Science and Technology, Jilin University, 130012 Changchun, China.
  • Zhenyu Wei
    Institute of Intelligent System and Bioinformatics, College of Intelligent Systems Science and Engineering, Harbin Engineering University, Harbin 150001, China.
  • Jing Ye
    d Department of Digestive System Diseases, The First Affiliated Hospital, Shihezi University School of Medicine, Shihezi, Xinjiang Province, China.
  • Lei Yu
    School of Urban and Environmental Sciences, Central China Normal University, Wuhan 430079, China; Key Laboratory for Geographical Process Analysis & Simulation of Hubei Province, Central China Normal University, Wuhan 430079, China.
  • Weixing Feng
    Institute of Intelligent System and Bioinformatics, College of Intelligent Systems Science and Engineering, Harbin Engineering University, Harbin, China.