Timer: Generative Pre-trained Transformers Are Large Time Series Models (2402.02368v3)
Abstract: Deep learning has contributed remarkably to the advancement of time series analysis. Still, deep models can encounter performance bottlenecks in real-world data-scarce scenarios, which can be concealed due to the performance saturation with small models on current benchmarks. Meanwhile, large models have demonstrated great powers in these scenarios through large-scale pre-training. Continuous progress has been achieved with the emergence of LLMs, exhibiting unprecedented abilities such as few-shot generalization, scalability, and task generality, which are however absent in small deep models. To change the status quo of training scenario-specific small models from scratch, this paper aims at the early development of large time series models (LTSM). During pre-training, we curate large-scale datasets with up to 1 billion time points, unify heterogeneous time series into single-series sequence (S3) format, and develop the GPT-style architecture toward LTSMs. To meet diverse application needs, we convert forecasting, imputation, and anomaly detection of time series into a unified generative task. The outcome of this study is a Time Series Transformer (Timer), which is generative pre-trained by next token prediction and adapted to various downstream tasks with promising capabilities as an LTSM. Code and datasets are available at: https://github.com/thuml/Large-Time-Series-Model.
- An empirical evaluation of generic convolutional and recurrent networks for sequence modeling. arXiv preprint arXiv:1803.01271, 2018.
- Beit: Bert pre-training of image transformers. arXiv preprint arXiv:2106.08254, 2021.
- A neural probabilistic language model. Advances in neural information processing systems, 13, 2000.
- On the opportunities and risks of foundation models. arXiv preprint arXiv:2108.07258, 2021.
- Box, G. Box and jenkins: time series analysis, forecasting and control. In A Very British Affair: Six Britons and the Development of Time Series Analysis During the 20th Century, pp. 161–215. Springer, 2013.
- Time series analysis: forecasting and control. John Wiley & Sons, 2015.
- Lof: identifying density-based local outliers. In Proceedings of the 2000 ACM SIGMOD international conference on Management of data, pp. 93–104, 2000.
- Llm4ts: Two-stage fine-tuning for time-series forecasting with pre-trained llms. arXiv preprint arXiv:2308.08469, 2023.
- A simple framework for contrastive learning of visual representations. In International conference on machine learning, pp. 1597–1607. PMLR, 2020.
- Why can gpt learn in-context? language models secretly perform gradient descent as meta optimizers. arXiv preprint arXiv:2212.10559, 2022.
- Long-term forecasting with tide: Time-series dense encoder. arXiv preprint arXiv:2304.08424, 2023a.
- A decoder-only foundation model for time-series forecasting. arXiv preprint arXiv:2310.10688, 2023b.
- The ucr time series archive. IEEE/CAA Journal of Automatica Sinica, 6(6):1293–1305, 2019.
- Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805, 2018.
- Simmtm: A simple pre-training framework for masked time-series modeling. arXiv preprint arXiv:2302.00861, 2023.
- Forecastpfn: Synthetically-trained zero-shot forecasting. arXiv preprint arXiv:2311.01933, 2023.
- Efficient tests for an autoregressive unit root. Econometrica, 1996.
- Friedman, M. The interpolation of time series by related series. Journal of the American Statistical Association, 57(300):729–757, 1962.
- Monash time series forecasting archive. arXiv preprint arXiv:2105.06643, 2021.
- Goerg, G. Forecastable component analysis. In International conference on machine learning, pp. 64–72. PMLR, 2013.
- Long short-term memory. Neural computation, 9(8):1735–1780, 1997.
- Time-llm: Time series forecasting by reprogramming large language models. arXiv preprint arXiv:2310.01728, 2023.
- Scaling laws for neural language models. arXiv preprint arXiv:2001.08361, 2020.
- Adam: A method for stochastic optimization. In ICLR, 2015. URL http://arxiv.org/abs/1412.6980.
- Scinet: time series modeling and forecasting with sample convolution and interaction. NeurIPS, 2022.
- itransformer: Inverted transformers are effective for time series forecasting. arXiv preprint arXiv:2310.06625, 2023.
- Era5-land: A state-of-the-art global reanalysis dataset for land applications. Earth system science data, 13(9):4349–4383, 2021.
- A time series is worth 64 words: Long-term forecasting with transformers. arXiv preprint arXiv:2211.14730, 2022.
- OpenAI, R. Gpt-4 technical report. arxiv 2303.08774. View in Article, 2:13, 2023.
- Pytorch: An imperative style, high-performance deep learning library. In NeurIPS, 2019.
- Improving language understanding by generative pre-training. 2018.
- Language models are unsupervised multitask learners. OpenAI blog, 1(8):9, 2019.
- Learning transferable visual models from natural language supervision. In International conference on machine learning, pp. 8748–8763. PMLR, 2021.
- Exploring the limits of transfer learning with a unified text-to-text transformer. The Journal of Machine Learning Research, 21(1):5485–5551, 2020.
- Lag-llama: Towards foundation models for time series forecasting. arXiv preprint arXiv:2310.08278, 2023.
- Time series extrinsic regression: Predicting numeric values from time series data. Data Mining and Knowledge Discovery, 35:1032–1060, 2021.
- Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971, 2023.
- Attention is all you need. Advances in neural information processing systems, 30, 2017.
- What language model architecture and pretraining objective works best for zero-shot generalization? In International Conference on Machine Learning, pp. 22964–22984. PMLR, 2022a.
- Contrast everything: A hierarchical contrastive framework for medical time-series. arXiv preprint arXiv:2310.14017, 2023.
- Learning latent seasonal-trend representations for time series forecasting. Advances in Neural Information Processing Systems, 35:38775–38787, 2022b.
- Cost: Contrastive learning of disentangled seasonal-trend representations for time series forecasting. arXiv preprint arXiv:2202.01575, 2022.
- Pushing the limits of pre-training for time series forecasting in the cloudops domain. arXiv preprint arXiv:2310.05063, 2023.
- Autoformer: Decomposition transformers with auto-correlation for long-term series forecasting. Advances in Neural Information Processing Systems, 34:22419–22430, 2021.
- Timesnet: Temporal 2d-variation modeling for general time series analysis. arXiv preprint arXiv:2210.02186, 2022.
- Current time series anomaly detection benchmarks are flawed and are creating the illusion of progress. IEEE Transactions on Knowledge and Data Engineering, 2021.
- Anomaly transformer: Time series anomaly detection with association discrepancy. arXiv preprint arXiv:2110.02642, 2021.
- Videogpt: Video generation using vq-vae and transformers. arXiv preprint arXiv:2104.10157, 2021.
- Ts2vec: Towards universal representation of time series. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 36, pp. 8980–8987, 2022.
- Are transformers effective for time series forecasting? In Proceedings of the AAAI conference on artificial intelligence, volume 37, pp. 11121–11128, 2023.
- A transformer-based framework for multivariate time series representation learning. In Proceedings of the 27th ACM SIGKDD conference on knowledge discovery & data mining, pp. 2114–2124, 2021.
- Self-supervised contrastive pre-training for time series via time-frequency consistency. Advances in Neural Information Processing Systems, 35:3988–4003, 2022.
- A survey of large language models. arXiv preprint arXiv:2303.18223, 2023.
- Informer: Beyond efficient transformer for long sequence time-series forecasting. In Proceedings of the AAAI conference on artificial intelligence, volume 35, pp. 11106–11115, 2021.
- One fits all: Power general time series analysis by pretrained lm. arXiv preprint arXiv:2302.11939, 2023.