CSBrain: A Cross-scale Spatiotemporal Brain Foundation Model for EEG Decoding
Journal:
arXiv
Published Date:
Jun 29, 2025
Abstract
Understanding and decoding brain activity from electroencephalography (EEG)
signals is a fundamental challenge in neuroscience and AI, with applications in
cognition, emotion recognition, diagnosis, and brain-computer interfaces. While
recent EEG foundation models advance generalized decoding via unified
architectures and large-scale pretraining, they adopt a scale-agnostic dense
modeling paradigm inherited from NLP and vision. This design neglects a core
property of neural activity: cross-scale spatiotemporal structure. EEG task
patterns span a wide range of temporal and spatial scales, from short bursts to
slow rhythms, and from localized cortical responses to distributed
interactions. Ignoring this diversity leads to suboptimal representations and
weak generalization. We propose CSBrain, a Cross-scale Spatiotemporal Brain
foundation model for generalized EEG decoding. CSBrain introduces: (i)
Cross-scale Spatiotemporal Tokenization (CST), which aggregates multi-scale
features from localized temporal windows and anatomical brain regions into
compact scale-aware tokens; and (ii) Structured Sparse Attention (SSA), which
captures cross-window and cross-region dependencies, enhancing scale diversity
while removing spurious correlations. CST and SSA are alternately stacked to
progressively integrate multi-scale dependencies. Experiments on 11 EEG tasks
across 16 datasets show that CSBrain consistently outperforms task-specific and
foundation model baselines. These results establish cross-scale modeling as a
key inductive bias and position CSBrain as a robust backbone for future
brain-AI research.