Emergent Mind

Understanding Different Design Choices in Training Large Time Series Models

Published Jun 20, 2024 in cs.LG and cs.AI


Inspired by LLMs, Time Series Forecasting (TSF), a long-standing task in time series analysis, is undergoing a transition towards Large Time Series Models (LTSMs), aiming to train universal transformer-based models for TSF. However, training LTSMs on heterogeneous time series data poses unique challenges, including diverse frequencies, dimensions, and patterns across datasets. Recent endeavors have studied and evaluated various design choices aimed at enhancing LTSM training and generalization capabilities, spanning pre-processing techniques, model configurations, and dataset configurations. In this work, we comprehensively analyze these design choices and aim to identify the best practices for training LTSM. Moreover, we propose \emph{time series prompt}, a novel statistical prompting strategy tailored to time series data. Furthermore, based on the observations in our analysis, we introduce \texttt{LTSM-bundle}, which bundles the best design choices we have identified. Empirical results demonstrate that \texttt{LTSM-bundle} achieves superior zero-shot and few-shot performances compared to state-of-the-art LSTMs and traditional TSF methods on benchmark datasets.

Important design choices in training the LTSM-Bundle framework.


  • The paper explores various design choices in training Large Time Series Models (LTSMs) for time series forecasting, highlighting the challenges posed by the heterogeneity in time series data.

  • It introduces a novel statistical prompting strategy called the 'Time Series Prompt' and an optimal combination of design choices termed the 'LTSM-bundle,' both showing superior performance in empirical tests.

  • The study compares different pre-processing techniques, tokenizations, training paradigms, base model selections, and dataset configurations, ultimately demonstrating how these choices impact model performance.

Understanding Different Design Choices in Training Large Time Series Models


Time series forecasting (TSF) remains a fundamental task in time series analysis, focusing on predicting future data points based on historical values. Over the years, TSF methodologies have evolved from traditional statistical techniques to machine learning and more recently to deep learning. The advent of transformers, particularly the ability of these architectures to excel in sequential modeling, has led to their application in TSF tasks, especially for long-term forecasting.

Drawing inspiration from the capabilities of LLMs, researchers are now exploring Large Time Series Models (LTSMs) utilizing transformer-based architectures for TSF. However, training LTSMs presents unique challenges due to the heterogeneity in time series data. These challenges include variations in data frequencies, dimensions, and patterns, which complicate the training of LTSMs to generalize across diverse datasets.

This paper provides a comprehensive analysis of various design choices in training LTSMs, spanning pre-processing techniques, model configurations, and dataset configurations. Additionally, the authors propose a novel statistical prompting strategy called the "Time Series Prompt" and introduce an optimal combination of design choices termed the "LTSM-bundle". The empirical results demonstrate the superior performance of LTSM-bundle in zero-shot and few-shot settings compared to state-of-the-art LTSMs.


Pre-processing: Instruction Prompts

The pre-processing step aims to enable LTSMs to better adapt to time series datasets. Two types of prompts are studied:

  1. Text Prompts: Task-specific information formatted into text.
  2. Time Series Prompts: A novel approach introduced in this paper. These prompts are generated by extracting statistical features from the training dataset, providing a robust statistical description of each dataset.

Empirical results indicate that time series prompts outperform text prompts, yielding up to 8% lower MAE scores. Additionally, the use of time series prompts results in up to 3% lower MSE scores when compared to scenarios without prompts.

Pre-processing: Tokenizations

This section evaluates linear tokenization and time series tokenization approaches:

  1. Linear Tokenization: Involves using a trainable linear layer to convert time series numbers into tokens.
  2. Time Series Tokenization: Converts continuous time series data into discrete tokens using a trainable function.

Linear tokenization proved more effective than time series tokenization in training LTSMs, leading to higher performance across diverse datasets.

Model Configuration: Training Paradigm

Three distinct training paradigms are compared:

  1. Fully Fine-tuning: Fine-tuning all parameters using pre-trained weights.
  2. Training from Scratch: Initializing all model parameters from scratch.
  3. LoRA Fine-tuning: Using low-rank adapters to fine-tune a limited number of parameters.

Fully fine-tuning emerged as the most effective strategy, offering significantly lower MSE and MAE scores compared to training from scratch and LoRA fine-tuning.

Model Configuration: Base Model Selection

Four pre-trained models were evaluated as potential backbones for LTSMs:

  1. GPT-2-Small
  2. GPT-2-Medium
  3. GPT-2-Large
  4. Phi-2

GPT-2-Medium and GPT-2-Small showed superior performance compared to GPT-2-Large, particularly in short-term and long-term forecasting scenarios respectively, suggesting these backbones are less prone to overfitting.

Dataset Configuration: Quantity and Diversity

The impact of data quantity and diversity on model performance was also examined:

  1. Data Quantity: Various down-sampling rates (10%, 5%, 2.5%) were compared.
  2. Diversity: Models were trained on an increasing number of datasets to evaluate performance improvements.

Using 5% of the training data generally provided the best balance for model granularity and performance. Increasing dataset diversity consistently improved model generalizability.

Comparison with State-of-the-Art Methods

The LTSM-bundle demonstrated superior performance across various benchmarks in zero-shot and few-shot settings. Notably, LTSM-bundle outperformed numerous state-of-the-art models like PatchTST, DLinear, and others.

Conclusion and Future Directions

This study provides an in-depth analysis of critical design choices in training LTSMs, yielding insights that culminate in the LTSM-bundle. This framework exhibits strong performance with enhanced generalizability and efficiency.

Future work might involve developing more nuanced prompting strategies and exploring synthetic datasets to further enhance LTSMs. Additionally, investigating variate-specific prompts and the integration of more complex statistical descriptions could yield further improvements.

Overall, this work lays substantial groundwork for advancing the field of time series forecasting using large-scale, transformer-based models.

Create an account to read this summary for free:


Get summaries of trending comp sci papers delivered straight to your inbox:

Unsubscribe anytime.