Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Time Series Data Augmentation for Deep Learning: A Survey (2002.12478v4)

Published 27 Feb 2020 in cs.LG, eess.SP, and stat.ML

Abstract: Deep learning performs remarkably well on many time series analysis tasks recently. The superior performance of deep neural networks relies heavily on a large number of training data to avoid overfitting. However, the labeled data of many real-world time series applications may be limited such as classification in medical time series and anomaly detection in AIOps. As an effective way to enhance the size and quality of the training data, data augmentation is crucial to the successful application of deep learning models on time series data. In this paper, we systematically review different data augmentation methods for time series. We propose a taxonomy for the reviewed methods, and then provide a structured review for these methods by highlighting their strengths and limitations. We also empirically compare different data augmentation methods for different tasks including time series classification, anomaly detection, and forecasting. Finally, we discuss and highlight five future directions to provide useful research guidance.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (7)
  1. Qingsong Wen (139 papers)
  2. Liang Sun (124 papers)
  3. Fan Yang (878 papers)
  4. Xiaomin Song (10 papers)
  5. Jingkun Gao (6 papers)
  6. Xue Wang (69 papers)
  7. Huan Xu (83 papers)
Citations (564)

Summary

  • The paper demonstrates that diverse augmentation techniques, including time-domain, frequency, and learning-based methods, significantly enhance model performance in tasks such as classification, anomaly detection, and forecasting.
  • The paper categorizes augmentation methods into basic, frequency, time-frequency, decomposition-based, and learning-based approaches, offering a systematic taxonomy for effective implementation.
  • The paper highlights future directions like optimal augmentation selection, imbalanced class strategies, and exploring advanced deep generative models to further improve synthetic time series data quality.

Time Series Data Augmentation for Deep Learning: An Analytical Overview

Introduction

The critical dependency of deep learning models on large labeled datasets poses significant challenges for time series applications where such data is limited. The paper, "Time Series Data Augmentation for Deep Learning: A Survey," provides a comprehensive examination of data augmentation methods tailored for time series data, a domain that lags behind others like computer vision in this respect. Time series tasks examined include classification, anomaly detection, and forecasting, emphasizing how well-structured data augmentation strategies can improve model performance.

Taxonomy and Methodologies

The paper proposes a taxonomy for data augmentation techniques, systematically categorized into basic and advanced methods:

  1. Basic Methods: These involve time-domain transformations, such as window cropping and warping, flipping, and noise injection, which are oriented toward direct manipulations of the input.
  2. Frequency Domain: Techniques like amplitude and phase perturbations are applied to the frequency spectrum, optimizing the unique characteristics of time series data.
  3. Time-Frequency Domain: Methods incorporate transformations like short-time Fourier transform (STFT), enabling explicit consideration of time-localized frequency content.
  4. Advanced Methods:
    • Decomposition-based: STL or RobustSTL decomposes time series into trend, seasonal, and residual components, augmenting residuals to construct new data.
    • Statistical Generative Models: These models exploit conditional distributions for time series segments, offering synthetic data generation reflective of true underlying structures.
    • Learning-based:
      • Embedding Space: Utilizes encoding operations in a latent space to interpolate or extrapolate new data points.
      • Deep Generative Models: GANs and other architectures like time-oriented variations allow the synthesis of complex time series data.
      • Automated Augmentation: AI techniques, such as reinforcement learning, optimize augmentation policy discovery.

Empirical Evaluations

The empirical analysis in the paper demonstrates the effectiveness of data augmentation across various typical time series tasks:

  • Classification: Augmentation yielded accuracy improvements in tasks subjected to outlier injection challenges.
  • Anomaly Detection: A notable increase in precision-recall metrics was observed, particularly when combining residual decomposition methods with augmentation.
  • Forecasting: Incorporation of basic augmentation methods improved mean absolute scaled error (MASE) across several datasets.

Future Directions

The paper identifies several compelling avenues for further exploration:

  • Time-Frequency Domain Augmentation: There is scope for the enhanced utilization of transformations like wavelet transforms to capture non-stationary dynamics more effectively.
  • Imbalanced Class Strategies: Novel augmentation combined with weighting methodologies addresses class imbalance, presenting a crucial area for future work.
  • Augmentation Selection: Strategies for optimal selection and combination of augmentation methods remain underexplored, which could lead to improvements in model robustness and generalization.
  • Gaussian Processes: Leveraging Gaussian and Deep Gaussian Processes offers potential for flexible, probabilistic augmentative models.
  • Expanding DGMs: Beyond GANs, exploring additional deep generative models like autoregressive networks or normalizing flows could provide robust data augmentation frameworks.

Conclusion

This paper's survey of augmentation methods underscores the critical role they play in enhancing model performance on time series data, despite the underlying challenges of data scarcity and imbalance. Future research in this domain has the potential to significantly advance deep learning methodologies and applications, making effective and efficient data utilization a practical reality.

Youtube Logo Streamline Icon: https://streamlinehq.com