GNN-Suite: a Graph Neural Network Benchmarking Framework for Biomedical Informatics
Journal:
arXiv
Published Date:
May 15, 2025
Abstract
We present GNN-Suite, a robust modular framework for constructing and
benchmarking Graph Neural Network (GNN) architectures in computational biology.
GNN-Suite standardises experimentation and reproducibility using the Nextflow
workflow to evaluate GNN performance. We demonstrate its utility in identifying
cancer-driver genes by constructing molecular networks from protein-protein
interaction (PPI) data from STRING and BioGRID and annotating nodes with
features from the PCAWG, PID, and COSMIC-CGC repositories.
Our design enables fair comparisons among diverse GNN architectures including
GAT, GAT3H, GCN, GCN2, GIN, GTN, HGCN, PHGCN, and GraphSAGE and a baseline
Logistic Regression (LR) model. All GNNs were configured as standardised
two-layer models and trained with uniform hyperparameters (dropout = 0.2; Adam
optimiser with learning rate = 0.01; and an adjusted binary cross-entropy loss
to address class imbalance) over an 80/20 train-test split for 300 epochs. Each
model was evaluated over 10 independent runs with different random seeds to
yield statistically robust performance metrics, with balanced accuracy (BACC)
as the primary measure. Notably, GCN2 achieved the highest BACC (0.807 +/-
0.035) on a STRING-based network, although all GNN types outperformed the LR
baseline, highlighting the advantage of network-based learning over
feature-only approaches.
Our results show that a common framework for implementing and evaluating GNN
architectures aids in identifying not only the best model but also the most
effective means of incorporating complementary data. By making GNN-Suite
publicly available, we aim to foster reproducible research and promote improved
benchmarking standards in computational biology. Future work will explore
additional omics datasets and further refine network architectures to enhance
predictive accuracy and interpretability in biomedical applications.