Protein function prediction using GO similarity-based heterogeneous network propagation.

Journal: Scientific reports
Published Date:

Abstract

Protein function prediction is a fundamental cornerstone in bioinformatics, providing critical insights into biological processes and disease mechanisms. Despite significant advances, challenges persist due to data sparsity and functional ambiguity. We introduce GOHPro (GO Similarity-based Heterogeneous Network Propagation), a novel method that constructs a heterogeneous network by integrating protein functional similarity (derived from domain profiles and modular complexes) with GO semantic relationships. This method applies a network propagation algorithm to prioritize annotations based on multi-omics context. When evaluated on yeast and human datasets, GOHPro outperformed six state-of-the-art methods. Specifically, it achieved F improvements ranging from 6.8 to 47.5% over methods like exp2GO across the Biological Process (BP), Molecular Function (MF), and Cellular Component (CC) ontologies in both yeast and human species. Rigorous case studies on proteins with shared domains, such as AAA + ATPases, demonstrated GOHPro's ability to resolve functional ambiguity by leveraging contextual interactions and modular complexes. Further validation on the CAFA3 benchmark confirmed its generalizability, with F gains exceeding 62% compared to baseline approaches in human species. Our analysis revealed that homology and network connectivity critically influence prediction robustness, with the modular similarity network compensating for evolutionary gaps in dark proteins. The framework's extensibility to de novo structural predictions highlights its potential to bridge the annotation gap in uncharacterized proteomes.

Authors

  • Sai Hu
    School of Mathematics, Changsha University, Changsha, 410022, Hunan, China.
  • Bihai Zhao
    Department of Mathematics and Computer Science, Changsha University, Changsha, 410022. China.