CLEAN-MI: A Scalable and Efficient Pipeline for Constructing High-Quality Neurodata in Motor Imagery Paradigm
Journal:
arXiv
Published Date:
Jun 13, 2025
Abstract
The construction of large-scale, high-quality datasets is a fundamental
prerequisite for developing robust and generalizable foundation models in motor
imagery (MI)-based brain-computer interfaces (BCIs). However, EEG signals
collected from different subjects and devices are often plagued by low
signal-to-noise ratio, heterogeneity in electrode configurations, and
substantial inter-subject variability, posing significant challenges for
effective model training. In this paper, we propose CLEAN-MI, a scalable and
systematic data construction pipeline for constructing large-scale, efficient,
and accurate neurodata in the MI paradigm. CLEAN-MI integrates frequency band
filtering, channel template selection, subject screening, and marginal
distribution alignment to systematically filter out irrelevant or low-quality
data and standardize multi-source EEG datasets. We demonstrate the
effectiveness of CLEAN-MI on multiple public MI datasets, achieving consistent
improvements in data quality and classification performance.