Gene function prediction based on Gene Ontology Hierarchy Preserving Hashing.

Journal: Genomics
Published Date:

Abstract

Gene Ontology (GO) uses structured vocabularies (or terms) to describe the molecular functions, biological roles, and cellular locations of gene products in a hierarchical ontology. GO annotations associate genes with GO terms and indicate the given gene products carrying out the biological functions described by the relevant terms. However, predicting correct GO annotations for genes from a massive set of GO terms as defined by GO is a difficult challenge. To combat with this challenge, we introduce a Gene Ontology Hierarchy Preserving Hashing (HPHash) based semantic method for gene function prediction. HPHash firstly measures the taxonomic similarity between GO terms. It then uses a hierarchy preserving hashing technique to keep the hierarchical order between GO terms, and to optimize a series of hashing functions to encode massive GO terms via compact binary codes. After that, HPHash utilizes these hashing functions to project the gene-term association matrix into a low-dimensional one and performs semantic similarity based gene function prediction in the low-dimensional space. Experimental results on three model species (Homo sapiens, Mus musculus and Rattus norvegicus) for interspecies gene function prediction show that HPHash performs better than other related approaches and it is robust to the number of hash functions. In addition, we also take HPHash as a plugin for BLAST based gene function prediction. From the experimental results, HPHash again significantly improves the prediction performance. The codes of HPHash are available at: http://mlda.swu.edu.cn/codes.php?name=HPHash.

Authors

  • Yingwen Zhao
  • Guangyuan Fu
    College of Computer and Information Science, Southwest University, Chongqing 400715, China.
  • Jun Wang
    Department of Speech, Language, and Hearing Sciences and the Department of Neurology, The University of Texas at Austin, Austin, TX 78712, USA.
  • Maozu Guo
    School of Computer Science and Technology, Harbin Institute of Technology, Harbin, Heilongjiang, China.
  • Guoxian Yu
    College of Computer and Information Science, Southwest University, Chongqing 400715, China Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun 130012, China.