Assessment of Semantic Similarity between Proteins Using Information Content and Topological Properties of the Gene Ontology Graph.

Journal: IEEE/ACM transactions on computational biology and bioinformatics
Published Date:

Abstract

The semantic similarity between two interacting proteins can be estimated by combining the similarity scores of the GO terms associated with the proteins. Greater number of similar GO annotations between two proteins indicates greater interaction affinity. Existing semantic similarity measures make use of the GO graph structure, the information content of GO terms, or a combination of both. In this paper, we present a hybrid approach which utilizes both the topological features of the GO graph and information contents of the GO terms. More specifically, we 1) consider a fuzzy clustering of the GO graph based on the level of association of the GO terms, 2) estimate the GO term memberships to each cluster center based on the respective shortest path lengths, and 3) assign weightage to GO term pairs on the basis of their dissimilarity with respect to the cluster centers. We test the performance of our semantic similarity measure against seven other previously published similarity measures using benchmark protein-protein interaction datasets of Homo sapiens and Saccharomyces cerevisiae based on sequence similarity, Pfam similarity, area under ROC curve, and measure.

Authors

  • Pritha Dutta
  • Subhadip Basu
  • Mahantapas Kundu
    Department of Computer Science and Engineering, Jadavpur University, Kolkata, 700032, India.