AIRES: Accelerating Out-of-Core GCNs via Algorithm-System Co-Design
Journal:
arXiv
Published Date:
Jul 2, 2025
Abstract
Graph convolutional networks (GCNs) are fundamental in various scientific
applications, ranging from biomedical protein-protein interactions (PPI) to
large-scale recommendation systems. An essential component for modeling graph
structures in GCNs is sparse general matrix-matrix multiplication (SpGEMM). As
the size of graph data continues to scale up, SpGEMMs are often conducted in an
out-of-core fashion due to limited GPU memory space in resource-constrained
systems. Albeit recent efforts that aim to alleviate the memory constraints of
out-of-core SpGEMM through either GPU feature caching, hybrid CPU-GPU memory
layout, or performing the computation in sparse format, current systems suffer
from both high I/O latency and GPU under-utilization issues.
In this paper, we first identify the problems of existing systems, where
sparse format data alignment and memory allocation are the main performance
bottlenecks, and propose AIRES, a novel algorithm-system co-design solution to
accelerate out-of-core SpGEMM computation for GCNs. Specifically, from the
algorithm angle, AIRES proposes to alleviate the data alignment issues on the
block level for matrices in sparse formats and develops a tiling algorithm to
facilitate row block-wise alignment. On the system level, AIRES employs a
three-phase dynamic scheduling that features a dual-way data transfer strategy
utilizing a tiered memory system: integrating GPU memory, GPU Direct Storage
(GDS), and host memory to reduce I/O latency and improve throughput.
Evaluations show that AIRES significantly outperforms the state-of-the-art
methods, achieving up to 1.8x lower latency in real-world graph processing
benchmarks.