Building and analysis of protein-protein interactions related to diabetes mellitus using support vector machine, biomedical text mining and network analysis.

Journal: Computational biology and chemistry
Published Date:

Abstract

In order to understand the molecular mechanism underlying any disease, knowledge about the interacting proteins in the disease pathway is essential. The number of revealed protein-protein interactions (PPI) is still very limited compared to the available protein sequences of different organisms. Experiment based high-throughput technologies though provide some data about these interactions, those are often fairly noisy. Computational techniques for predicting protein-protein interactions therefore assume significance. 1296 binary fingerprints that encode a combination of structural and geometric properties were developed using the crystallographic data of 15,000 protein complexes in the pdb server. In a case study, these fingerprints were created for proteins implicated in the Type 2 diabetes mellitus disease. The fingerprints were input into a SVM based model for discriminating disease proteins from non disease proteins yielding a classification accuracy of 78.2% (AUC value of 0.78) on an external data set composed of proteins retrieved via text mining of diabetes related literature. A PPI network was constructed and analysed to explore new disease targets. The integrated approach exemplified here has a potential for identifying disease related proteins, functional annotation and other proteomics studies.

Authors

  • Renu Vyas
  • Sanket Bapat
  • Esha Jain
  • Muthukumarasamy Karthikeyan
    Digital Information Resource Centre (DIRC) & Centre of Excellence in Scientific Computing (CoESC) CSIR-National Chemical Laboratory Pune - 411008 India. m.karthikeyan@ncl.res.in.
  • Sanjeev Tambe
    Chemical Engineering and Process Development Division, CSIR-National Chemical Laboratory, Pune, 411008, India.
  • Bhaskar D Kulkarni