From Video to EEG: Adapting Joint Embedding Predictive Architecture to Uncover Visual Concepts in Brain Signal Analysis
Journal:
arXiv
Published Date:
Jul 4, 2025
Abstract
EEG signals capture brain activity with high temporal and low spatial
resolution, supporting applications such as neurological diagnosis, cognitive
monitoring, and brain-computer interfaces. However, effective analysis is
hindered by limited labeled data, high dimensionality, and the absence of
scalable models that fully capture spatiotemporal dependencies. Existing
self-supervised learning (SSL) methods often focus on either spatial or
temporal features, leading to suboptimal representations. To this end, we
propose EEG-VJEPA, a novel adaptation of the Video Joint Embedding Predictive
Architecture (V-JEPA) for EEG classification. By treating EEG as video-like
sequences, EEG-VJEPA learns semantically meaningful spatiotemporal
representations using joint embeddings and adaptive masking. To our knowledge,
this is the first work that exploits V-JEPA for EEG classification and explores
the visual concepts learned by the model. Evaluations on the publicly available
Temple University Hospital (TUH) Abnormal EEG dataset show that EEG-VJEPA
outperforms existing state-of-the-art models in classification accuracy. Beyond
classification accuracy, EEG-VJEPA captures physiologically relevant spatial
and temporal signal patterns, offering interpretable embeddings that may
support human-AI collaboration in diagnostic workflows. These findings position
EEG-VJEPA as a promising framework for scalable, trustworthy EEG analysis in
real-world clinical settings.