The Necessity of Multiple Data Sources for ECG-Based Machine Learning Models.

Journal: Studies in health technology and informatics
PMID:

Abstract

Even though the interest in machine learning studies is growing significantly, especially in medicine, the imbalance between study results and clinical relevance is more pronounced than ever. The reasons for this include data quality and interoperability issues. Hence, we aimed at examining site- and study-specific differences in publicly available standard electrocardiogram (ECG) datasets, which in theory should be interoperable by consistent 12-lead definition, sampling rate, and measurement duration. The focus lies upon the question of whether even slight study peculiarities can affect the stability of trained machine learning models. To this end, the performances of modern network architectures as well as unsupervised pattern detection algorithms are investigated across different datasets. Overall, this is intended to examine the generalization of machine learning results of single-site ECG studies.

Authors

  • Lucas Plagwitz
    Institute for Translational Psychiatry, University of Münster, Münster, Germany.
  • Tobias Vogelsang
    Institute of Medical Informatics, University of Münster, Germany.
  • Florian Doldi
    Department for Cardiology II-Electrophysiology, University Hospital Münster, Germany.
  • Lucas Bickmann
    Institute of Medical Informatics, University of Münster, Münster, Germany.
  • Michael Fujarski
    Institute of Medical Informatics, University of Münster, Münster, Germany.
  • Lars Eckardt
    Department for Cardiology II-Electrophysiology, University Hospital Münster, Germany.
  • Julian Varghese
    Institute of Medical Data Science, Otto-von-Guericke University, Magdeburg, Germany.