Improving protein-protein interactions prediction accuracy using protein evolutionary information and relevance vector machine model.

Journal: Protein science : a publication of the Protein Society
Published Date:

Abstract

Predicting protein-protein interactions (PPIs) is a challenging task and essential to construct the protein interaction networks, which is important for facilitating our understanding of the mechanisms of biological systems. Although a number of high-throughput technologies have been proposed to predict PPIs, there are unavoidable shortcomings, including high cost, time intensity, and inherently high false positive rates. For these reasons, many computational methods have been proposed for predicting PPIs. However, the problem is still far from being solved. In this article, we propose a novel computational method called RVM-BiGP that combines the relevance vector machine (RVM) model and Bi-gram Probabilities (BiGP) for PPIs detection from protein sequences. The major improvement includes (1) Protein sequences are represented using the Bi-gram probabilities (BiGP) feature representation on a Position Specific Scoring Matrix (PSSM), in which the protein evolutionary information is contained; (2) For reducing the influence of noise, the Principal Component Analysis (PCA) method is used to reduce the dimension of BiGP vector; (3) The powerful and robust Relevance Vector Machine (RVM) algorithm is used for classification. Five-fold cross-validation experiments executed on yeast and Helicobacter pylori datasets, which achieved very high accuracies of 94.57 and 90.57%, respectively. Experimental results are significantly better than previous methods. To further evaluate the proposed method, we compare it with the state-of-the-art support vector machine (SVM) classifier on the yeast dataset. The experimental results demonstrate that our RVM-BiGP method is significantly better than the SVM-based method. In addition, we achieved 97.15% accuracy on imbalance yeast dataset, which is higher than that of balance yeast dataset. The promising experimental results show the efficiency and robust of the proposed method, which can be an automatic decision support tool for future proteomics research. For facilitating extensive studies for future proteomics research, we developed a freely available web server called RVM-BiGP-PPIs in Hypertext Preprocessor (PHP) for predicting PPIs. The web server including source code and the datasets are available at http://219.219.62.123:8888/BiGP/.

Authors

  • Ji-Yong An
    School of Computer Science and Technology, China University of Mining and Technology, Xuzhou, Jiangsu 21116, China.
  • Fan-Rong Meng
    School of Computer Science and Technology, China University of Mining and Technology, Xuzhou, Jiangsu 21116, China.
  • Zhu-Hong You
    Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Sciences, Urumqi, 830011, China. zhuhongyou@ms.xjb.ac.cn.
  • Xing Chen
    School of Information and Electrical Engineering, China University of Mining and Technology, Xuzhou, 221116, China. xingchen@amss.ac.cn.
  • Gui-Ying Yan
    Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing, 100010, China.
  • Ji-Pu Hu
    School of Computer Science Technology, China University of Mining and Technology, Xuzhou, Jiangsu, 21116, China.