Papers
Topics
Authors
Recent
Detailed Answer
Quick Answer
Concise responses based on abstracts only
Detailed Answer
Well-researched responses based on abstracts and relevant paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses
Gemini 2.5 Flash
Gemini 2.5 Flash 56 tok/s
Gemini 2.5 Pro 39 tok/s Pro
GPT-5 Medium 15 tok/s Pro
GPT-5 High 16 tok/s Pro
GPT-4o 99 tok/s Pro
Kimi K2 155 tok/s Pro
GPT OSS 120B 476 tok/s Pro
Claude Sonnet 4 38 tok/s Pro
2000 character limit reached

An Empirical Evaluation of Generic Convolutional and Recurrent Networks for Sequence Modeling (1803.01271v2)

Published 4 Mar 2018 in cs.LG, cs.AI, and cs.CL

Abstract: For most deep learning practitioners, sequence modeling is synonymous with recurrent networks. Yet recent results indicate that convolutional architectures can outperform recurrent networks on tasks such as audio synthesis and machine translation. Given a new sequence modeling task or dataset, which architecture should one use? We conduct a systematic evaluation of generic convolutional and recurrent architectures for sequence modeling. The models are evaluated across a broad range of standard tasks that are commonly used to benchmark recurrent networks. Our results indicate that a simple convolutional architecture outperforms canonical recurrent networks such as LSTMs across a diverse range of tasks and datasets, while demonstrating longer effective memory. We conclude that the common association between sequence modeling and recurrent networks should be reconsidered, and convolutional networks should be regarded as a natural starting point for sequence modeling tasks. To assist related work, we have made code available at http://github.com/locuslab/TCN .

Citations (4,287)
List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

Summary

  • The paper demonstrates that TCNs outperform recurrent architectures such as LSTMs and GRUs on several sequence modeling tasks.
  • Methodology highlights include the use of causal and dilated convolutions with residual connections, validated through synthetic and real-world experiments.
  • Implications suggest TCNs offer longer effective memory, faster convergence, and improved performance, advocating them as a strong alternative to recurrent networks.

Convolutional Sequence Modeling

This paper presents an empirical evaluation of temporal convolutional networks (TCNs) against recurrent architectures, specifically LSTMs and GRUs, across a range of sequence modeling tasks. The paper challenges the prevailing association between sequence modeling and recurrent networks, advocating for TCNs as a viable alternative.

TCN Architecture and Properties

The TCN architecture is characterized by causal convolutions, ensuring no information leakage from future to past, and the capability to map an input sequence to an output sequence of the same length. The architecture incorporates dilated convolutions to enable large receptive fields and residual connections to facilitate training deep networks.

Key advantages of TCNs include parallelism, flexible receptive field size, stable gradients, and low memory requirements during training. However, TCNs may require more memory for data storage during evaluation and may necessitate parameter adjustments when transferring across domains with differing memory requirements.

Experimental Evaluation

The empirical evaluation encompasses synthetic stress tests (adding problem, sequential MNIST, permuted MNIST, copy memory) and real-world datasets (polyphonic music modeling, word-level LLMing, character-level LLMing). The results demonstrate that TCNs outperform canonical recurrent networks across a diverse range of tasks. In particular, TCNs exhibit longer effective memory compared to LSTMs and GRUs.

Results and Analysis

On the adding problem, TCNs and GRUs converge to near-perfect solutions, while LSTMs and vanilla RNNs perform poorly. For sequential and permuted MNIST, TCNs outperform recurrent architectures in terms of convergence speed and final accuracy. In the copy memory task, TCNs demonstrate superior performance, especially for longer sequence lengths.

In polyphonic music modeling tasks (JSB Chorales and Nottingham), TCNs outperform recurrent models. For word-level LLMing, TCNs outperform GRUs and vanilla RNNs on the Penn Treebank (PTB) corpus and achieve lower perplexities on the larger Wikitext-103 corpus and LAMBADA dataset compared to LSTMs. On character-level LLMing tasks, TCNs outperform regularized LSTMs and GRUs. Ablation studies confirm that filter size and residual connections contribute to sequence modeling performance.

Implications and Future Directions

The findings suggest that convolutional networks, particularly TCNs, should be considered a natural starting point for sequence modeling tasks. The paper calls for further research into TCN architectures, including exploration of regularization and optimization techniques. Code is available to encourage further research.

Conclusion

The paper concludes that TCNs offer a competitive alternative to recurrent networks for sequence modeling. The results highlight the potential of convolutional architectures to achieve state-of-the-art performance across a range of tasks, challenging the historical dominance of recurrent networks in this domain.

Youtube Logo Streamline Icon: https://streamlinehq.com