Papers
Topics
Authors
Recent
2000 character limit reached

Empirical Study of Mix-based Data Augmentation Methods in Physiological Time Series Data

Published 18 Sep 2023 in cs.LG | (2309.09970v1)

Abstract: Data augmentation is a common practice to help generalization in the procedure of deep model training. In the context of physiological time series classification, previous research has primarily focused on label-invariant data augmentation methods. However, another class of augmentation techniques (\textit{i.e., Mixup}) that emerged in the computer vision field has yet to be fully explored in the time series domain. In this study, we systematically review the mix-based augmentations, including mixup, cutmix, and manifold mixup, on six physiological datasets, evaluating their performance across different sensory data and classification tasks. Our results demonstrate that the three mix-based augmentations can consistently improve the performance on the six datasets. More importantly, the improvement does not rely on expert knowledge or extensive parameter tuning. Lastly, we provide an overview of the unique properties of the mix-based augmentation methods and highlight the potential benefits of using the mix-based augmentation in physiological time series data.

Citations (3)

Summary

  • The paper presents an empirical evaluation showing that mix-based augmentation techniques consistently outperform traditional methods in physiological time series classification.
  • It details the use of mixup, cutmix, and manifold mixup across six datasets, demonstrating improved accuracy and effective handling of class imbalances.
  • Results indicate that these methods enhance feature representation and model interpretability via vicinal risk minimization and t-SNE visualizations.

Empirical Evaluation of Mix-based Data Augmentation Methods in Physiological Time Series Data

Introduction

Data augmentation is an established technique to improve model generalization by simulating variety within datasets. Traditionally used approaches in time series data involve transformations such as jittering, scaling, and permutation, which aim to preserve label invariance. However, many of these methods are dataset-dependent and may introduce alterations detrimental to physiological signal integrity. This paper explores mix-based augmentation techniques—mixup, cutmix, and manifold mixup—originating from the computer vision domain, assessing their utility within physiological time series classification tasks. Figure 1

Figure 1: The overview of mix-based augmentation procedure for physiological time series classification.

Methodology

The study introduces three mix-based augmentations:

  • Mixup: Generates virtual samples by linear interpolations of inputs and labels.
  • Cutmix: Replaces segments of data from one input with segments from another.
  • Manifold Mixup: Applies mixing at the feature map level within neural networks.

These methods are applied across six diverse physiological datasets, with results compared against baseline models and traditional augmentation techniques. Mix-based methods are of particular interest due to their independence from dataset-specific knowledge and parameter tuning. Figure 2

Figure 2: Illustrations of traditional time series data augmentations.

Empirical Evaluation

Performance Across Datasets: In experiments, mix-based augmentation methods consistently outperformed traditional augmentation techniques. Mixup and its variants showed improved accuracy and less dependency on expert parameter tuning. Specifically, cutmix and manifold mixup yielded superior results across multiple datasets such as PTB-XL, Apnea-ECG, Sleep-EDF, MMIDB, PAMAP2, and UCI-HAR.

Case Analysis with PTB-XL Dataset: The PTB-XL dataset, characterized by class imbalance, was used to further profile these methods. Mix-based augmentations not only improved classification metrics but also enhanced predictions for minority classes when paired with a class-balanced sampler (Figure 3). Figure 3

Figure 3: (a): The PTB-XL validation accuracy computed after each training epoch, with baseline and variations of mixup. (b): The scatter plot of validation accuracy against F1 score for all 80 profiling experiments on PTB-XL dataset.

Feature Representation and Interpretation

t-SNE visualizations illustrated that mix-based methods facilitate better separability in feature space, thereby enhancing model interpretability and generalization (Figure 4). This supports the notion that vicinal risk minimization inherent in mixup augments class-distinct feature representations. Figure 4

Figure 4: The t-SNE visualizations of cutmix and vanilla settings after training on PTB-XL.

Conclusion

This study underscores the efficacy of mix-based data augmentation techniques in time series classification. By leveraging methods that are agnostic to specific dataset attributes, these techniques offer robust performance improvements without extensive parameterization. Future research will focus on integrating these mix-based methods with traditional augmentations and extending their applicability to frequency-domain features in time series data.

Paper to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Authors (3)

Collections

Sign up for free to add this paper to one or more collections.