On-the-fly Data Augmentation for Forecasting with Deep Learning (2404.16918v1)
Abstract: Deep learning approaches are increasingly used to tackle forecasting tasks. A key factor in the successful application of these methods is a large enough training sample size, which is not always available. In these scenarios, synthetic data generation techniques are usually applied to augment the dataset. Data augmentation is typically applied before fitting a model. However, these approaches create a single augmented dataset, potentially limiting their effectiveness. This work introduces OnDAT (On-the-fly Data Augmentation for Time series) to address this issue by applying data augmentation during training and validation. Contrary to traditional methods that create a single, static augmented dataset beforehand, OnDAT performs augmentation on-the-fly. By generating a new augmented dataset on each iteration, the model is exposed to a constantly changing augmented data variations. We hypothesize this process enables a better exploration of the data space, which reduces the potential for overfitting and improves forecasting performance. We validated the proposed approach using a state-of-the-art deep learning forecasting method and 8 benchmark datasets containing a total of 75797 time series. The experiments suggest that OnDAT leads to better forecasting performance than a strategy that applies data augmentation before training as well as a strategy that does not involve data augmentation. The method and experiments are publicly available.
- International Journal of Forecasting 27(3), 822–844 (2011)
- Pattern Recognition 120, 108,148 (2021)
- International journal of forecasting 32(2), 303–312 (2016)
- Business Intelligence: Second European Summer School, eBISS 2012, Brussels, Belgium, July 15-21, 2012, Tutorial Lectures 2 pp. 62–77 (2013)
- Machine Learning 109(11), 1997–2028 (2020)
- Journal of Intelligent Information Systems 59(2), 415–433 (2022)
- In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 37, pp. 6989–6997 (2023)
- J. Off. Stat 6(1), 3–73 (1990)
- In: Statistical models in S, pp. 309–376. Routledge (2017)
- Efron, B.: Bootstrap methods: another look at the jackknife. In: Breakthroughs in statistics: Methodology and distribution, pp. 569–593. Springer (1992)
- In: 2017 IEEE international conference on data mining (ICDM), pp. 865–870. IEEE (2017)
- Friedman, J.H.: Stochastic gradient boosting. Computational statistics & data analysis 38(4), 367–378 (2002)
- Knowledge-Based Systems 233, 107,518 (2021)
- OTexts (2018)
- International Journal of Forecasting 36(1), 167–177 (2020)
- Kahn, K.B.: How to measure the impact of a forecast error on an enterprise? The Journal of Business Forecasting 22(1), 21 (2003)
- Statistical Analysis and Data Mining: The ASA Data Science Journal 13(4), 354–376 (2020)
- Kunsch, H.R.: The jackknife and the bootstrap for general stationary observations. The annals of Statistics pp. 1217–1241 (1989)
- Lahiri, S.N.: Resampling methods for dependent data. Springer Science & Business Media (2013)
- International Journal of Forecasting 37(4), 1748–1764 (2021)
- Journal of forecasting 1(2), 111–153 (1982)
- International journal of forecasting 16(4), 451–476 (2000)
- International Journal of forecasting 34(4), 802–808 (2018)
- In: ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 7689–7693. IEEE (2020)
- arXiv preprint arXiv:1905.10437 (2019)
- arXiv preprint arXiv:1904.08779 (2019)
- International journal of forecasting 36(3), 1181–1191 (2020)
- arXiv preprint arXiv:2312.01344 (2023)
- Journal of big data 6(1), 1–48 (2019)
- In: 2018 17th IEEE international conference on machine learning and applications (ICMLA), pp. 1394–1401. IEEE (2018)
- Smyl, S.: A hybrid method of exponential smoothing and recurrent neural networks for time series forecasting. International Journal of Forecasting 36(1), 75–85 (2020)
- arXiv preprint arXiv:2002.12478 (2020)
- In: The eleventh international conference on learning representations (2022)
- In: Proceedings of the AAAI conference on artificial intelligence, vol. 35, pp. 11,106–11,115 (2021)