Graph-Agnostic Linear Transformers.
Journal:
Neural networks : the official journal of the International Neural Network Society
Published Date:
Jan 16, 2026
Abstract
Graph Transformers (GTs), as emerging foundational encoders for graph-structured data, have shown promising performance due to the integration of local graph structures with global attention mechanisms. However, the complex attention functions and their coupling with graph structures incur significant computational overhead, particularly in large-scale graphs. In this paper, we decouple graph structures from Transformers and propose the Graph-Agnostic Linear Transformer (GALiT). In GALiT, graph structures are solely utilized to denoise raw node features before training, as our findings reveal that these denoised features have integrated the main information of the graph structure and can replace it to guide Transformers. By excluding graph structures from the training and inference stages, GALiT serves as a graph-agnostic model which significantly reduces computational complexity. Additionally, we simplify the linear attention functions inherited from traditional Transformers, which further reduces computational overhead while still capturing the relationships between nodes. Through weighted combination, we integrate the denoised features into the attention mechanism, as our theoretical analysis reveals the key role of the synergy between linear attention and denoised features in enhancing representation diversity. Despite decoupling graph structures and simplifying attention mechanisms, our model surprisingly outperforms most GNNs and GTs on benchmark graphs. Experimental results indicate that GALiT achieves high efficiency while maintaining or even enhancing performance.
Authors
Keywords
No keywords available for this article.