Time-series ECG Imputation Using a Pattern-Based Masking Framework

Journal: medRxiv
Published Date:

Abstract

The utilization of continuous ECG monitoring has become an integral part of modern hospital-based care. However, missing data presents significant challenges in deploying real-time ECG-based predictive systems. Research on the implementation of imputation techniques on time-series ECG is limited. Furthermore, the performance of imputation techniques is typically benchmarked using random masking, which may not reflect the real-world missingness patterns encountered in clinical practice. This study aimed to evaluate and benchmark a range of imputation methods, from conventional statistical approaches to state-of-the-art deep learning models, using continuous ECG time-series data under different missingness conditions, including both random (conventional) and observed pattern-based (realistic) missingness. Time-domain features were extracted from continuous 12-lead Holter recordings (ranging from 2.5 to 4 hours per patient) from a pilot cohort of 40 patients. Missingness was introduced using random and pattern-based masking. We compared seven imputation methods: global mean, linear interpolation, K-Nearest Neighbors (KNN), Multiple Imputation by Chained Equations (MICE), softImpute, xgbooSt MIssing vaLues in timE Series (SMILES), and Self-Attention-based Imputation for Time Series (SAITS). Performance was evaluated using mean absolute error (MAE) across masking approaches, missingness levels, and missingness patterns. Overall, the MAEs for all seven imputations are higher under pattern-based masking than random masking. SAITS achieved the best performance across both masking approaches (MAEs of 0.277 and 0.146; standard deviations of absolute error of 0.398 and 0.252 for observed-pattern and random masking, respectively). Simpler methods such as SoftImpute and KNN showed comparable performance across both masking approaches, and particularly under certain missingness levels. Artificially masking by random may underestimate the accuracy of time-series imputation in real-world scenarios. Our findings underscore the importance of context-based imputation strategies (i.e., masking approach and imputation method) and balancing model complexity with practical considerations (e.g., resources, costs, and level of missingness) for real-time deployment. Author SummaryMissing data is a common issue in healthcare research, especially in continuous recordings such as electrocardiograms (ECGs), which are used to monitor heart activity over time. When certain measurements are missing, researchers and clinicians often rely on imputation, which is simply a technique to"fill in the blanks" However, how well imputations work can depend on why and how the data went missing in the first place. In this study, we compared several imputation methods, ranging from basic averaging to advanced machine learning techniques, using real patient ECG data. To better reflect the types of missingness observed in clinical settings, we applied a new approach to simulate more realistic data gaps and patterns. We found that some of the more complex models performed better overall, although simpler methods were surprisingly resilient in certain situations. Our findings underscore the importance of aligning the imputation strategy with the specific data challenges. Our work may help guide future use of ECG data in real-time prediction models and improve how missing information is handled in healthcare research and practice.

Authors

  • Suba
  • S.; Novak
  • A.; Xia
  • X.; Al-Zaiti
  • S. S.; Pelter
  • M. M.