Is Limited Participant Diversity Impeding EEG-based Machine Learning?
Journal:
arXiv
Published Date:
Mar 11, 2025
Abstract
The application of machine learning (ML) to electroencephalography (EEG) has
great potential to advance both neuroscientific research and clinical
applications. However, the generalisability and robustness of EEG-based ML
models often hinge on the amount and diversity of training data. It is common
practice to split EEG recordings into small segments, thereby increasing the
number of samples substantially compared to the number of individual recordings
or participants. We conceptualise this as a multi-level data generation process
and investigate the scaling behaviour of model performance with respect to the
overall sample size and the participant diversity through large-scale empirical
studies. We then use the same framework to investigate the effectiveness of
different ML strategies designed to address limited data problems: data
augmentations and self-supervised learning. Our findings show that model
performance scaling can be severely constrained by participant distribution
shifts and provide actionable guidance for data collection and ML research.