Papers
Topics
Authors
Recent
Detailed Answer
Quick Answer
Concise responses based on abstracts only
Detailed Answer
Well-researched responses based on abstracts and relevant paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses
Gemini 2.5 Flash
Gemini 2.5 Flash 44 tok/s
Gemini 2.5 Pro 41 tok/s Pro
GPT-5 Medium 13 tok/s Pro
GPT-5 High 15 tok/s Pro
GPT-4o 86 tok/s Pro
Kimi K2 208 tok/s Pro
GPT OSS 120B 447 tok/s Pro
Claude Sonnet 4 36 tok/s Pro
2000 character limit reached

Online Data Augmentation for Forecasting with Deep Learning (2404.16918v2)

Published 25 Apr 2024 in cs.LG and stat.ML

Abstract: Deep learning approaches are increasingly used to tackle forecasting tasks involving datasets with multiple univariate time series. A key factor in the successful application of these methods is a large enough training sample size, which is not always available. Synthetic data generation techniques can be applied in these scenarios to augment the dataset. Data augmentation is typically applied offline before training a model. However, when training with mini-batches, some batches may contain a disproportionate number of synthetic samples that do not align well with the original data characteristics. This work introduces an online data augmentation framework that generates synthetic samples during the training of neural networks. By creating synthetic samples for each batch alongside their original counterparts, we maintain a balanced representation between real and synthetic data throughout the training process. This approach fits naturally with the iterative nature of neural network training and eliminates the need to store large augmented datasets. We validated the proposed framework using 3797 time series from 6 benchmark datasets, three neural architectures, and seven synthetic data generation techniques. The experiments suggest that online data augmentation leads to better forecasting performance compared to offline data augmentation or no augmentation approaches. The framework and experiments are publicly available.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (34)
  1. International Journal of Forecasting 27(3), 822–844 (2011)
  2. Pattern Recognition 120, 108,148 (2021)
  3. International journal of forecasting 32(2), 303–312 (2016)
  4. Business Intelligence: Second European Summer School, eBISS 2012, Brussels, Belgium, July 15-21, 2012, Tutorial Lectures 2 pp. 62–77 (2013)
  5. Machine Learning 109(11), 1997–2028 (2020)
  6. Journal of Intelligent Information Systems 59(2), 415–433 (2022)
  7. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 37, pp. 6989–6997 (2023)
  8. J. Off. Stat 6(1), 3–73 (1990)
  9. In: Statistical models in S, pp. 309–376. Routledge (2017)
  10. Efron, B.: Bootstrap methods: another look at the jackknife. In: Breakthroughs in statistics: Methodology and distribution, pp. 569–593. Springer (1992)
  11. In: 2017 IEEE international conference on data mining (ICDM), pp. 865–870. IEEE (2017)
  12. Friedman, J.H.: Stochastic gradient boosting. Computational statistics & data analysis 38(4), 367–378 (2002)
  13. Knowledge-Based Systems 233, 107,518 (2021)
  14. OTexts (2018)
  15. International Journal of Forecasting 36(1), 167–177 (2020)
  16. Kahn, K.B.: How to measure the impact of a forecast error on an enterprise? The Journal of Business Forecasting 22(1), 21 (2003)
  17. Statistical Analysis and Data Mining: The ASA Data Science Journal 13(4), 354–376 (2020)
  18. Kunsch, H.R.: The jackknife and the bootstrap for general stationary observations. The annals of Statistics pp. 1217–1241 (1989)
  19. Lahiri, S.N.: Resampling methods for dependent data. Springer Science & Business Media (2013)
  20. International Journal of Forecasting 37(4), 1748–1764 (2021)
  21. Journal of forecasting 1(2), 111–153 (1982)
  22. International journal of forecasting 16(4), 451–476 (2000)
  23. International Journal of forecasting 34(4), 802–808 (2018)
  24. In: ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 7689–7693. IEEE (2020)
  25. arXiv preprint arXiv:1905.10437 (2019)
  26. arXiv preprint arXiv:1904.08779 (2019)
  27. International journal of forecasting 36(3), 1181–1191 (2020)
  28. arXiv preprint arXiv:2312.01344 (2023)
  29. Journal of big data 6(1), 1–48 (2019)
  30. In: 2018 17th IEEE international conference on machine learning and applications (ICMLA), pp. 1394–1401. IEEE (2018)
  31. Smyl, S.: A hybrid method of exponential smoothing and recurrent neural networks for time series forecasting. International Journal of Forecasting 36(1), 75–85 (2020)
  32. arXiv preprint arXiv:2002.12478 (2020)
  33. In: The eleventh international conference on learning representations (2022)
  34. In: Proceedings of the AAAI conference on artificial intelligence, vol. 35, pp. 11,106–11,115 (2021)
Citations (3)

Summary

  • The paper introduces OnDAT, a dynamic augmentation method that integrates data synthesis into model training to enhance forecast accuracy.
  • It employs rolling seasonality decomposition and moving block bootstrapping to dynamically generate realistic time series variations.
  • Experimental results on 8 benchmark datasets reveal lower SMAPE scores and improved model robustness compared to static augmentation methods.

Enhancing Forecasting Models with On-the-fly Data Augmentation: Facilitating Deep Learning with OnDAT

Introduction

Recent advances in time series forecasting have highlighted the potential of deep learning models, such as NHITS and N-BEATS, which outperform traditional methods like ARIMA in benchmark tasks. A critical factor in harnessing the full power of these models is the availability of substantial training data. In scenarios with data scarcity, data augmentation can supplement limited data and improve model performance. Conventional augmentation approaches prepare a static augmented dataset prior to training, which may not sufficiently capture the variability of the data generating process. This paper introduces On-the-fly Data Augmentation for Time series (OnDAT), a dynamic augmentation approach that integrates data augmentation directly into the training and validation processes of deep learning models.

Motivation and Background

Deep learning models' success in time series forecasting often hinges on large training datasets, which are not always available. Traditional data augmentation techniques address this limitation by generating synthetic data before model training. However, this static approach might fail to comprehensively explore the data space, potentially resulting in models that are less robust and more prone to overfitting. The proposed method, OnDAT, aims to mitigate these risks by implementing data augmentation dynamically during model training, ensuring a continuous exposure to varied data iterations and fostering a more thorough exploration of the data space.

OnDAT Methodology

OnDAT leverages a unique combination of rolling seasonality decompositions and moving block bootstrapping (MBB) to produce augmented data dynamically during the training process. This method involves:

  • Decomposing each time series in a mini-batch into trend, seasonality, and remainder components.
  • Applying MBB to the remainder to create a synthetic series, maintaining the temporal dependencies.
  • Reconstructing the time series from the augmented remainder and the original trend and seasonal components.
  • Using this freshly augmented batch for model training or validation.

This process is repeated for each training iteration, providing the model with rich and varied data which enhances generalization and reduces overfitting.

Experimental Design and Results

The efficacy of OnDAT was assessed across 8 benchmark datasets totaling over 75,000 time series. Key findings include:

  • OnDAT significantly outperforms traditional static data augmentation methods, showcasing lower average SMAPE scores across multiple datasets.
  • The dynamic augmentation process enhances not only the robustness but the general predictive capabilities of the forecasting models, as evidenced by consistent improvements over baseline methods.
  • Both the training and validation phases benefit from OnDAT, producing models that not only perform well on unseen data but also offer reliable performance estimates during validation.
  • Computational overheads, while present, are not prohibitive, suggesting that OnDAT's benefits can be realized in practical scenarios without excessive costs.

Implications and Future Directions

The introduction of OnDAT marks a significant step forward in the application of data augmentation in time series forecasting. It successfully addresses the limitations of static augmentation methods and paves the way for more accurate and robust forecasting models. Future research could explore the application of OnDAT with different neural network architectures and in conjunction with other data augmentation techniques to further enhance its effectiveness and applicability.

Moreover, there is potential to integrate adaptive mechanisms that fine-tune the augmentation process based on real-time feedback during training, optimizing the quality of synthetic data generated and potentially reducing computational demands.

Conclusion

OnDAT introduces a flexible, dynamic framework for data augmentation that is directly integrated into the training process of forecasting models, enhancing their performance and reliability. Its ability to continuously adapt and generate new data iterations represents a substantial improvement over static augmentation methods, making it a valuable tool for improving forecasts in scenarios with limited data.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

Lightbulb On Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com