GCANet: Enhancing EEG-based auditory attention decoding with temporal frequency GCN and cross attention mechanisms.

Journal: Neuroscience
Published Date:

Abstract

In complex auditory environments, individuals rely on selective auditory attention to focus on a target speaker while suppressing competing sounds, a phenomenon commonly referred to as the cocktail party effect. Auditory attention decoding (AAD) seeks to identify the attended speaker from electroencephalography (EEG) signals. However, most existing approaches overlook the inherent graph-structured nature of EEG. To address this limitation, we propose GCANet, an end-to-end model that integrates a time-frequency graph convolutional network (TFGCN) to capture functional connectivity across brain regions and incorporates a cross-attention mechanism to dynamically enhance interactions between EEG and audio features. Experiments on three publicly available datasets (KUL, DTU, and AVGC) demonstrate that GCANet substantially improves decoding accuracy in both cross-trial and cross-subject evaluations. With a 1-second decision window, GCANet achieves average accuracies of 92.2%, 83.2% and 62.6% in cross-trial settings, and 75.1%, 57.1% and 55.6% in cross-subject settings. Notably, our findings suggest that the alignment between auditory attention and visual cues may introduce gaze-related confounds, which could inadvertently enhance model performance, particularly at shorter decision windows. Furthermore, our analysis indicates that EEG-audio cross-attention highlights consistent involvement of frontal and temporal regions. These findings suggest that the proposed approach can provide useful insights into cross-modal EEG-audio interactions and may inform future research on auditory attention decoding. Code is available at: https://github.com/Mistborn666666/GCANet.

Authors

Keywords

No keywords available for this article.