Information Leakage and Performance Overestimation in EEG-Based Schizophrenia Detection: Evidence from Literature and Empirical Analyses

Journal: medRxiv
Published Date:

Abstract

Detecting schizophrenia (SZ) from electroencephalography (EEG) signals using machine- and deep learning models gained traction lately due to potential utility in early disease detection and differential diagnosis. Classification performance reports in the range of 95% accuracy and above are common; however, review of state-of-the-art literature indicates that ∼65% of published works involve erroneous practices in the evaluation pipeline such as epoch-instead of subject-based data splitting, or ranking and selecting features before data partitioning. The consequent information leakage can result in an overestimation of SZ detection performance. Here we explicitly test this on three, open SZ-EEG datasets using gold standard classification approaches in leaky and leakage-free implementations. Results indicate that information leakage can inflate SZ classification accuracy by up to ∼30%. Accordingly, best practices regarding EEG-based SZ detection must be established and promoted before this technology can be further developed into a clinical decision-making tool.

Authors

  • Frigyes Samuel Racz; Gabor Csukly