Autoformer: Decomposition Transformers with Auto-Correlation for Long-Term Series Forecasting
(2106.13008v5)
Published 24 Jun 2021 in cs.LG and cs.AI
Abstract: Extending the forecasting time is a critical demand for real applications, such as extreme weather early warning and long-term energy consumption planning. This paper studies the long-term forecasting problem of time series. Prior Transformer-based models adopt various self-attention mechanisms to discover the long-range dependencies. However, intricate temporal patterns of the long-term future prohibit the model from finding reliable dependencies. Also, Transformers have to adopt the sparse versions of point-wise self-attentions for long series efficiency, resulting in the information utilization bottleneck. Going beyond Transformers, we design Autoformer as a novel decomposition architecture with an Auto-Correlation mechanism. We break with the pre-processing convention of series decomposition and renovate it as a basic inner block of deep models. This design empowers Autoformer with progressive decomposition capacities for complex time series. Further, inspired by the stochastic process theory, we design the Auto-Correlation mechanism based on the series periodicity, which conducts the dependencies discovery and representation aggregation at the sub-series level. Auto-Correlation outperforms self-attention in both efficiency and accuracy. In long-term forecasting, Autoformer yields state-of-the-art accuracy, with a 38% relative improvement on six benchmarks, covering five practical applications: energy, traffic, economics, weather and disease. Code is available at this repository: \url{https://github.com/thuml/Autoformer}.
The paper introduces Autoformer, a novel model that integrates series decomposition and auto-correlation to enhance long-term forecasting accuracy.
It decomposes time series into trend-cyclical and seasonal components using moving averages and FFT-based auto-correlation for efficient dependency capture.
Experimental results demonstrate significant MSE reduction and superior performance across benchmarks such as ETT, Electricity, and COVID-19 forecasting.
Autoformer: Decomposition Transformers with Auto-Correlation for Long-Term Series Forecasting
This paper introduces Autoformer, a novel architecture designed for long-term time series forecasting, addressing the challenges of intricate temporal patterns and computational inefficiency in existing Transformer-based models. By integrating series decomposition and an Auto-Correlation mechanism, Autoformer achieves state-of-the-art accuracy across various real-world applications.
Decomposition Architecture
Autoformer departs from traditional Transformer architectures by incorporating a decomposition architecture that separates time series into trend-cyclical and seasonal components. This decomposition is implemented through a series decomposition block, which progressively extracts the long-term stationary trend from intermediate hidden variables. This is accomplished using a moving average operation:
Xt=AvgPool(Padding(X))Xs=X−Xt
where X is the input series, Xt is the trend-cyclical component, and Xs is the seasonal component (Figure 1). The encoder focuses on modeling the seasonal part, while the decoder accumulates the trend-cyclical components.
Figure 1: Autoformer architecture, highlighting the series decomposition blocks and the Auto-Correlation mechanism within the encoder-decoder structure.
The inputs to the decoder are initialized with both seasonal and trend-cyclical parts, refined through stacked Auto-Correlation mechanisms. The encoder-decoder Auto-Correlation utilizes past seasonal information from the encoder to enhance prediction accuracy.
Auto-Correlation Mechanism
Instead of self-attention, Autoformer employs an Auto-Correlation mechanism to capture period-based dependencies and aggregate similar sub-series (Figure 2). This mechanism is inspired by stochastic process theory, where the autocorrelation RXX(τ) measures the time-delay similarity between a time series {Xt} and its lagged series {Xt−τ}:
RXX(τ)=L→∞limL1t=1∑LXtXt−τ
Figure 2: Illustration of the Auto-Correlation mechanism, showing time delay aggregation using Fast Fourier Transform (FFT) to calculate autocorrelation.
The Auto-Correlation mechanism identifies the most probable k period lengths {τ1,…,τk} and aggregates sub-series using time delay aggregation:
where Q, K, and V are the query, key, and value series, respectively, Roll(V,τi) is the series rolled by time delay τi, and RQ,K(τi) is the softmax normalized confidence (Figure 3). This approach achieves O(LlogL) complexity by using FFT to compute autocorrelation.
Figure 3: Comparison of Auto-Correlation with various self-attention mechanisms, highlighting its series-wise connections focusing on connections of sub-series among underlying periods.
Experimental Results
The paper presents extensive experimental results on six real-world benchmarks, including ETT, Electricity, Exchange, Traffic, Weather, and ILI datasets. Autoformer consistently outperforms state-of-the-art models in long-term forecasting scenarios. For instance, under the input-96-predict-336 setting, Autoformer achieves a 74% MSE reduction on the ETT dataset compared to previous methods. In univariate settings, Autoformer demonstrates superior performance on the ETT and Exchange datasets.
Ablation Studies and Model Analysis
Ablation studies validate the effectiveness of the decomposition architecture and the Auto-Correlation mechanism. The decomposition architecture improves the performance of other models, and the Auto-Correlation mechanism outperforms self-attention variants in terms of both accuracy and memory efficiency. Visualizations of learned seasonal components and dependencies further illustrate the model's ability to capture complex temporal patterns (Figures 4 and 5).
Figure 4: Visualization of learned seasonal components XdeM and trend-cyclical components TdeM, demonstrating the progressive extraction of trend information.
Figure 5: Visualization of learned dependencies, highlighting the top-6 time delay sizes τ1,…,τ6 identified by Auto-Correlation.
Analysis of learned lags reveals that Autoformer can capture complex seasonalities in real-world series, such as monthly, quarterly, and yearly periods in the Exchange dataset, and daily and weekly patterns in the Traffic dataset (Figure 6). Efficiency analysis confirms the O(LlogL) complexity of Autoformer, demonstrating its advantages in memory and time compared to self-attention-based models (Figure 7).
Figure 6: Statistics of learned lags, showing density histograms for the top 10 lags learned by the decoder for the input-96-predict-336 task.
Figure 7: Efficiency analysis comparing Auto-Correlation with self-attention mechanisms in terms of memory usage and running time.
Examples of prediction cases on the ETT dataset are shown in Figures 8, 9, 10, and 11, and for the Exchange dataset in Figure 8, for the ETT dataset under the univariate setting in Figure 9, and for COVID-19 data in Figure 10.
Figure 11: Prediction cases from the ETT dataset under the input-96-predict-96 setting.
Figure 12: Prediction cases from the ETT dataset under the input-96-predict-192 setting.
Figure 13: Prediction cases from the ETT dataset under the input-96-predict-336 setting.
Figure 14: Prediction cases from the ETT dataset under the input-96-predict-720 setting.
Figure 8: Prediction cases from the Exchange dataset under the input-96-predict-192 setting.
Figure 9: Prediction cases from the ETT dataset under the input-96-predict-720 univariate setting.
Figure 10: Showcases from the second country of COVID-19 under the input-7-predict-15 setting.
Conclusion
Autoformer presents a decomposition architecture with an Auto-Correlation mechanism for long-term time series forecasting. By integrating series decomposition and series-level dependencies, Autoformer achieves state-of-the-art performance on various real-world datasets. The proposed approach offers a promising direction for future research in time series analysis and forecasting, particularly in applications requiring long-term predictions and the capability to handle intricate temporal patterns.