Large Language Models Are Zero-Shot Time Series Forecasters (2310.07820v3)
Abstract: By encoding time series as a string of numerical digits, we can frame time series forecasting as next-token prediction in text. Developing this approach, we find that LLMs such as GPT-3 and LLaMA-2 can surprisingly zero-shot extrapolate time series at a level comparable to or exceeding the performance of purpose-built time series models trained on the downstream tasks. To facilitate this performance, we propose procedures for effectively tokenizing time series data and converting discrete distributions over tokens into highly flexible densities over continuous values. We argue the success of LLMs for time series stems from their ability to naturally represent multimodal distributions, in conjunction with biases for simplicity, and repetition, which align with the salient features in many time series, such as repeated seasonal trends. We also show how LLMs can naturally handle missing data without imputation through non-numerical text, accommodate textual side information, and answer questions to help explain predictions. While we find that increasing model size generally improves performance on time series, we show GPT-4 can perform worse than GPT-3 because of how it tokenizes numbers, and poor uncertainty calibration, which is likely the result of alignment interventions such as RLHF.
- Colt5: Faster long-range transformers with conditional computation. arXiv preprint arXiv:2303.09752, 2023.
- Physics of language models: Part 1, context-free grammar. arXiv preprint arXiv:2305.13673, 2023.
- Path independent equilibrium models can better exploit test-time computation. Advances in Neural Information Processing Systems, 35:7796–7809, 2022.
- Palm 2 technical report. arXiv preprint arXiv:2305.10403, 2023.
- Anthropic. Introducing 100k context windows. Anthropic blog, 2023. URL https://www.anthropic.com/index/100k-context-windows.
- Deep probabilistic time series forecasting over long horizons. openreview preprint, 2022. URL https://openreview.net/forum?id=22h1XSEiN0.
- Emergent and predictable memorization in large language models. arXiv preprint arXiv:2304.11158, 2023.
- Some recent advances in forecasting and control. Journal of the Royal Statistical Society. Series C (Applied Statistics), 17(2):91–109, 1968.
- Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901, 2020.
- Sparks of artificial general intelligence: Early experiments with gpt-4. arXiv preprint arXiv:2303.12712, 2023.
- N-hits: Neural hierarchical interpolation for time series forecasting. arXiv preprint arXiv:2201.12886, 2022.
- Speak, memory: An archaeology of books known to chatgpt/gpt-4. arXiv preprint arXiv:2305.00118, 2023.
- Deep reinforcement learning from human preferences. Advances in neural information processing systems, 30, 2017.
- Miles Cranmer. Interpretable machine learning for science with pysr and symbolicregression. jl. arXiv preprint arXiv:2305.01582, 2023.
- Language modeling is compression. arXiv preprint arXiv:2309.10668, 2023.
- Preformer: Predictive transformer with multi-scale segment-wise correlations for long-term time series forecasting. arXiv preprint arXiv:2202.11356, 2022.
- Gpytorch: Blackbox matrix-matrix gaussian process inference with gpu acceleration. In Advances in Neural Information Processing Systems, 2018.
- Monash time series forecasting archive. arXiv preprint arXiv:2105.06643, 2021.
- The no free lunch theorem, kolmogorov complexity, and the role of inductive biases in machine learning. arXiv preprint arXiv:2304.05366, 2023.
- Deep learning. MIT press, 2016.
- The lie derivative for measuring learned equivariance. arXiv preprint arXiv:2210.02984, 2022.
- Measuring massive multitask language understanding. arXiv preprint arXiv:2009.03300, 2020.
- Darts: User-friendly modern machine learning for time series. The Journal of Machine Learning Research, 23(1):5442–5447, 2022.
- Forecast evaluation for data scientists: common pitfalls and best practices. Data Mining and Knowledge Discovery, 37(2):788–832, 2023.
- The curious case of neural text degeneration. arXiv preprint arXiv:1904.09751, 2019.
- Forecasting with exponential smoothing: the state space approach. Springer Science & Business Media, 2008.
- Ai in healthcare: time-series forecasting using statistical, neural, and ensemble architectures. Frontiers in big data, 3:4, 2020.
- Temporal convolutional networks: A unified approach to action segmentation. In Computer Vision–ECCV 2016 Workshops: Amsterdam, The Netherlands, October 8-10 and 15-16, 2016, Proceedings, Part III 14, pages 47–54. Springer, 2016.
- Deduplicating training data makes language models better. arXiv preprint arXiv:2107.06499, 2021.
- Tiedong Liu and Bryan Kian Hsiang Low. Goat: Fine-tuned llama outperforms gpt-4 on arithmetic tasks. arXiv preprint arXiv:2305.14201, 2023.
- A scalable hierarchical distributed language model. Advances in neural information processing systems, 21, 2008.
- imputets: time series missing value imputation in r. R J., 9(1):207, 2017.
- Show your work: Scratchpads for intermediate computation with language models. arXiv preprint arXiv:2112.00114, 2021.
- In-context learning and induction heads. Transformer Circuits Thread, 2022. https://transformer-circuits.pub/2022/in-context-learning-and-induction-heads/index.html.
- Wavenet: A generative model for raw audio. In 9th ISCA Speech Synthesis Workshop, pages 125–125. ISCA, 2016.
- OpenAI. Gpt-4 technical report. arXiv, 2023.
- N-beats: Neural basis expansion analysis for interpretable time series forecasting. Journal of Machine Learning Research, 21(111):1–63, 2020.
- Training language models to follow instructions with human feedback. Advances in Neural Information Processing Systems, 35:27730–27744, 2022.
- Catboost: unbiased boosting with categorical features. In Advances in Neural Information Processing Systems, volume 31, pages 6638–6648. NeurIPS, 2018.
- Deepar: Probabilistic forecasting with autoregressive recurrent networks. International Journal of Forecasting, 36(3):1181–1191, 2020.
- Can you learn an algorithm? generalizing from easy to hard problems with recurrent networks. Advances in Neural Information Processing Systems, 34:6695–6706, 2021.
- Ilya Sutskever. An observation on generalization. Workshop on Large Language Models and Transformers, 2023. URL https://www.youtube.com/watch?v=AKMuA_TVz3A&ab_channel=SimonsInstitute.
- Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971, 2023a.
- Llama 2: Open foundation and fine-tuned chat models. ArXiv, abs/2307.09288, 2023b.
- On the identification of sales forecasting models in the presence of promotions. Journal of the operational Research Society, 66(2):299–307, 2015.
- Finetuned language models are zero-shot learners. arXiv preprint arXiv:2109.01652, 2021.
- Chain of thought prompting elicits reasoning in large language models. arXiv preprint arXiv:2201.11903, 2022.
- Gaussian process kernels for pattern discovery and extrapolation. In International conference on machine learning, pages 1067–1075. PMLR, 2013.
- Autoformer: Decomposition transformers with auto-correlation for long-term series forecasting. Advances in Neural Information Processing Systems, 34, 2021.
- Promptcast: A new prompt-based learning paradigm for time series forecasting, 2023.
- How well do large language models perform in arithmetic tasks? arXiv preprint arXiv:2304.02015, 2023.
- Are transformers effective for time series forecasting? arXiv preprint arXiv:2205.13504, 2022.
- Meta-transformer: A unified framework for multimodal learning. arXiv preprint arXiv:2307.10802, 2023.
- Informer: Beyond efficient transformer for long sequence time-series forecasting. In Proceedings of AAAI, 2021.
- FEDformer: Frequency enhanced decomposed transformer for long-term series forecasting. In Proc. 39th International Conference on Machine Learning (ICML 2022), 2022.
- One fits all: Power general time series analysis by pretrained lm. arXiv preprint arXiv:2302.11939, 2023.