Informed, but Not Always Improved: Challenging the Benefit of Background Knowledge in GNNs
Journal:
arXiv
Published Date:
May 16, 2025
Abstract
In complex and low-data domains such as biomedical research, incorporating
background knowledge (BK) graphs, such as protein-protein interaction (PPI)
networks, into graph-based machine learning pipelines is a promising research
direction. However, while BK is often assumed to improve model performance, its
actual contribution and the impact of imperfect knowledge remain poorly
understood. In this work, we investigate the role of BK in an important
real-world task: cancer subtype classification. Surprisingly, we find that (i)
state-of-the-art GNNs using BK perform no better than uninformed models like
linear regression, and (ii) their performance remains largely unchanged even
when the BK graph is heavily perturbed. To understand these unexpected results,
we introduce an evaluation framework, which employs (i) a synthetic setting
where the BK is clearly informative and (ii) a set of perturbations that
simulate various imperfections in BK graphs. With this, we test the robustness
of BK-aware models in both synthetic and real-world biomedical settings. Our
findings reveal that careful alignment of GNN architectures and BK
characteristics is necessary but holds the potential for significant
performance improvements.