A comprehensive survey and comparative analysis of time series data augmentation in medical wearable computing.

Journal: PloS one
PMID:

Abstract

Recent advancements in hardware technology have spurred a surge in the popularity and ubiquity of wearable sensors, opening up new applications within the medical domain. This proliferation has resulted in a notable increase in the availability of Time Series (TS) data characterizing behavioral or physiological information from the patient, leading to initiatives toward leveraging machine learning and data analysis techniques. Nonetheless, the complexity and time required for collecting data remain significant hurdles, limiting dataset sizes and hindering the effectiveness of machine learning. Data Augmentation (DA) stands out as a prime solution, facilitating the generation of synthetic data to address challenges associated with acquiring medical data. DA has shown to consistently improve performances when images are involved. As a result, investigations have been carried out to check DA for TS, in particular for TS classification. However, the current state of DA in TS classification faces challenges, including methodological taxonomies restricted to the univariate case, insufficient direction to select suitable DA methods and a lack of conclusive evidence regarding the amount of synthetic data required to attain optimal outcomes. This paper conducts a comprehensive survey and experiments on DA techniques for TS and their application to TS classification. We propose an updated taxonomy spanning across three families of Time Series Data Augmentation (TSDA): Random Transformation (RT), Pattern Mixing (PM), and Generative Models (GM). Additionally, we empirically evaluate 12 TSDA methods across diverse datasets used in medical-related applications, including OPPORTUNITY and HAR for Human Activity Recognition, DEAP for emotion recognition, BioVid Heat Pain Database (BVDB), and PainMonit Database (PMDB) for pain recognition. Through comprehensive experimental analysis, we identify the most optimal DA techniques and provide recommendations for researchers regarding the generation of synthetic data to maximize outcomes from DA methods. Our findings show that despite their simplicity, DA methods of the RT family are the most consistent in increasing performances compared to not using any augmentation.

Authors

  • Md Abid Hasan
    Department of Computer Science and Engineering, University of California Riverside, 900 University Ave, Riverside, 92507, CA, USA. mhasa006@ucr.edu.
  • Frédéric Li
    Institute of Medical Informatics, University of Lübeck, Ratzeburger Allee 160, Lübeck 23538, Germany. Electronic address: li@imi.uni-luebeck.de.
  • Philip Gouverneur
    Institute of Medical Informatics, University of Lübeck, Ratzeburger Allee 160, 23562 Lübeck, Germany.
  • Artur Piet
    Institute of Medical Informatics, University of Luebeck, Ratzeburger Allee 160, 23562, Luebeck, Germany. ar.piet@uni-luebeck.de.
  • Marcin Grzegorzek
    Institute for Vision and Graphics, University of Siegen, Hoerlindstr. 3, 57076 Siegen, Germany.