Machine Learning with Imbalanced EEG Datasets using Outlier-based Sampling.
Journal:
Annual International Conference of the IEEE Engineering in Medicine and Biology Society. IEEE Engineering in Medicine and Biology Society. Annual International Conference
Published Date:
Jul 1, 2020
Abstract
Epilepsy is a neurological disorder which causes seizures in over 65 million people worldwide. Recently developed implantable therapeutic devices aim to prevent symptoms by applying acute electrical stimulation to the seizure-generating brain region in response to activity detected by on-device machine learning hardware. Many training algorithms require an equal number of examples for each target class (e.g. normal activity and seizures), and performance can suffer if this condition is not satisfied. In the case of epilepsy, poor performance can cause seizures to be missed, or stimulation to be applied erroneously. As there is an abundance of normal (interictal) data in clinical EEG recordings, but seizures are rare events (less than 1% of the dataset), the data available for training is severely imbalanced. There are several conventional pre-processing methods used to address imbalanced class learning, such as down-sampling of the majority class and up-sampling of the minority class, but each have performance drawbacks. This paper presents an improved method which involves reducing the majority class down to the most effective interictal outlier samples. Outliers are determined by using Exponentially Decaying Memory Signal Energy (EDMSE) features with Isolation Forests and an ANOVA-based method, which involves comparing a moving feature window to a baseline reference window. Outlier-based sampling is tested with two classifiers (KNN and Logistic Regression) and achieves higher accuracy (∼2% increase) and fewer false positives (∼38% decrease), along with a lower latency (∼3 seconds shorter) compared to conventional training set pre-processing methods.