Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
166 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Multiplicative Models for Recurrent Language Modeling (1907.00455v1)

Published 30 Jun 2019 in cs.LG, cs.CL, and stat.ML

Abstract: Recently, there has been interest in multiplicative recurrent neural networks for LLMing. Indeed, simple Recurrent Neural Networks (RNNs) encounter difficulties recovering from past mistakes when generating sequences due to high correlation between hidden states. These challenges can be mitigated by integrating second-order terms in the hidden-state update. One such model, multiplicative Long Short-Term Memory (mLSTM) is particularly interesting in its original formulation because of the sharing of its second-order term, referred to as the intermediate state. We explore these architectural improvements by introducing new models and testing them on character-level LLMing tasks. This allows us to establish the relevance of shared parametrization in recurrent LLMing.

Citations (1)

Summary

We haven't generated a summary for this paper yet.