GOBeacon: An ensemble model for protein function prediction enhanced by contrastive learning.

Journal: Protein science : a publication of the Protein Society
Published Date:

Abstract

Accurate prediction of protein function is fundamental to understanding biological processes, with computational methods becoming increasingly essential as experimental methods struggle to keep pace with the rate of newly discovered proteins. Despite significant advances in machine learning approaches, existing methods often fail to capture the complex relationships between protein structure, evolution, and function, leading to limited prediction accuracy. The challenge lies in effectively integrating diverse biological data types while maintaining computational efficiency. Here, we show that GOBeacon, a novel ensemble model integrating structure-aware protein language model embeddings with protein-protein interaction networks, achieves high accuracy in protein function prediction. By employing a contrastive learning framework, GOBeacon demonstrates superior performance on the sequence-based CAFA3 benchmark, achieving F scores of 0.561 (BP), 0.583 (MF), and 0.651 (CC), outperforming existing methods including domain-PFP and DeepGOPlus. The model's effectiveness extends to structure-based function prediction tasks, where it matches or exceeds the performance of specialized structure-based tools like HEAL and DeepFRI, while not being explicitly trained on structure. We anticipate that GOBeacon's architecture will serve as a foundation for next-generation protein analysis tools, while its modular design enables future integration of additional data types and improved prediction capabilities. These advances represent a significant step toward reliable automated protein function annotation, addressing a critical bottleneck in modern biology. GOBeacon is now publicly available: https://github.com/wlin16/GOBeacon.git.

Authors

  • Weining Lin
    Institute of Structural and Molecular Biology, University College London, London, UK.
  • David Miller
    School of EECS, Penn State, University Park, PA, 16802, U.S.A. djmiller@engr.psu.edu.
  • Zhonghui Gu
    Peking-Tsinghua Center for Life Sciences, Academy for Advanced Interdisciplinary Studies, Peking University, Beijing, China.
  • Christine Orengo
    Institute of Structural and Molecular Biology, University College London, London, UK. c.orengo@ucl.ac.uk.