Uncertainty in deep learning for EEG under dataset shifts.
Journal:
Artificial intelligence in medicine
Published Date:
Feb 2, 2026
Abstract
As artificial intelligence (AI) is increasingly integrated into medical diagnostics, it is essential that predictive models provide not only accurate outputs but also reliable estimates of uncertainty. In clinical applications, where decisions have significant consequences, understanding the confidence behind each prediction is as critical as the prediction itself. Uncertainty modelling plays a key role in improving trust, guiding decision-making, and identifying unreliable outputs, particularly under dataset shift or in out-of-distribution settings. The primary aim of uncertainty metrics is to align model confidence closely with actual predictive performance, ensuring confidence estimates dynamically adjust to reflect increasing errors or decreasing reliability of predictions. This study investigates how different ensemble learning strategies affect both performance and uncertainty estimation in a clinically relevant task: classifying Normal, Mild Cognitive Impairment, and Dementia from electroencephalography (EEG) data. We evaluated the performance and uncertainty of ensemble methods and Monte Carlo dropout on a large EEG dataset. The models were assessed in three settings: (1) in-distribution performance on a held-out test set, (2) generalisation to three out-of-distribution datasets, and (3) performance under gradual, EEG-specific dataset shifts simulating noise, drift, and frequency perturbation. Ensembles consisting of multiple independently trained models, such as deep ensembles, consistently achieved higher performance in both the in-distribution test set and the out-of-distribution datasets. These models also produced more informative and reliable uncertainty estimates under various types of EEG dataset shifts. These results highlight the benefits of ensemble diversity and independent training to build robust and uncertainty-aware EEG classification models. The findings are particularly relevant for clinical applications, where reliability under distribution shift and transparent uncertainty are essential for safe deployment.
Authors
Keywords
No keywords available for this article.