On-the-fly Data Augmentation for Forecasting with Deep Learning (2404.16918v1)

Published 25 Apr 2024 in cs.LG and stat.ML

Abstract: Deep learning approaches are increasingly used to tackle forecasting tasks. A key factor in the successful application of these methods is a large enough training sample size, which is not always available. In these scenarios, synthetic data generation techniques are usually applied to augment the dataset. Data augmentation is typically applied before fitting a model. However, these approaches create a single augmented dataset, potentially limiting their effectiveness. This work introduces OnDAT (On-the-fly Data Augmentation for Time series) to address this issue by applying data augmentation during training and validation. Contrary to traditional methods that create a single, static augmented dataset beforehand, OnDAT performs augmentation on-the-fly. By generating a new augmented dataset on each iteration, the model is exposed to a constantly changing augmented data variations. We hypothesize this process enables a better exploration of the data space, which reduces the potential for overfitting and improves forecasting performance. We validated the proposed approach using a state-of-the-art deep learning forecasting method and 8 benchmark datasets containing a total of 75797 time series. The experiments suggest that OnDAT leads to better forecasting performance than a strategy that applies data augmentation before training as well as a strategy that does not involve data augmentation. The method and experiments are publicly available.

References (34)

Citations (3)

View on Semantic Scholar

Summary

The paper introduces OnDAT, a dynamic augmentation method that integrates data synthesis into model training to enhance forecast accuracy.
It employs rolling seasonality decomposition and moving block bootstrapping to dynamically generate realistic time series variations.
Experimental results on 8 benchmark datasets reveal lower SMAPE scores and improved model robustness compared to static augmentation methods.

Enhancing Forecasting Models with On-the-fly Data Augmentation: Facilitating Deep Learning with OnDAT

Introduction

Recent advances in time series forecasting have highlighted the potential of deep learning models, such as NHITS and N-BEATS, which outperform traditional methods like ARIMA in benchmark tasks. A critical factor in harnessing the full power of these models is the availability of substantial training data. In scenarios with data scarcity, data augmentation can supplement limited data and improve model performance. Conventional augmentation approaches prepare a static augmented dataset prior to training, which may not sufficiently capture the variability of the data generating process. This paper introduces On-the-fly Data Augmentation for Time series (OnDAT), a dynamic augmentation approach that integrates data augmentation directly into the training and validation processes of deep learning models.

Motivation and Background

Deep learning models' success in time series forecasting often hinges on large training datasets, which are not always available. Traditional data augmentation techniques address this limitation by generating synthetic data before model training. However, this static approach might fail to comprehensively explore the data space, potentially resulting in models that are less robust and more prone to overfitting. The proposed method, OnDAT, aims to mitigate these risks by implementing data augmentation dynamically during model training, ensuring a continuous exposure to varied data iterations and fostering a more thorough exploration of the data space.

OnDAT Methodology

OnDAT leverages a unique combination of rolling seasonality decompositions and moving block bootstrapping (MBB) to produce augmented data dynamically during the training process. This method involves:

Decomposing each time series in a mini-batch into trend, seasonality, and remainder components.
Applying MBB to the remainder to create a synthetic series, maintaining the temporal dependencies.
Reconstructing the time series from the augmented remainder and the original trend and seasonal components.
Using this freshly augmented batch for model training or validation.

This process is repeated for each training iteration, providing the model with rich and varied data which enhances generalization and reduces overfitting.

Experimental Design and Results

The efficacy of OnDAT was assessed across 8 benchmark datasets totaling over 75,000 time series. Key findings include:

OnDAT significantly outperforms traditional static data augmentation methods, showcasing lower average SMAPE scores across multiple datasets.
The dynamic augmentation process enhances not only the robustness but the general predictive capabilities of the forecasting models, as evidenced by consistent improvements over baseline methods.
Both the training and validation phases benefit from OnDAT, producing models that not only perform well on unseen data but also offer reliable performance estimates during validation.
Computational overheads, while present, are not prohibitive, suggesting that OnDAT's benefits can be realized in practical scenarios without excessive costs.

Implications and Future Directions

The introduction of OnDAT marks a significant step forward in the application of data augmentation in time series forecasting. It successfully addresses the limitations of static augmentation methods and paves the way for more accurate and robust forecasting models. Future research could explore the application of OnDAT with different neural network architectures and in conjunction with other data augmentation techniques to further enhance its effectiveness and applicability.

Moreover, there is potential to integrate adaptive mechanisms that fine-tune the augmentation process based on real-time feedback during training, optimizing the quality of synthetic data generated and potentially reducing computational demands.

Conclusion

OnDAT introduces a flexible, dynamic framework for data augmentation that is directly integrated into the training process of forecasting models, enhancing their performance and reliability. Its ability to continuously adapt and generate new data iterations represents a substantial improvement over static augmentation methods, making it a valuable tool for improving forecasts in scenarios with limited data.

Related Papers

Tweets

https://twitter.com/StatMLPapers/status/1785158183540973585