Mogrifier LSTM (1909.01792v2)

Published 4 Sep 2019 in cs.CL

Abstract: Many advances in Natural Language Processing have been based upon more expressive models for how inputs interact with the context in which they occur. Recurrent networks, which have enjoyed a modicum of success, still lack the generalization and systematicity ultimately required for modelling language. In this work, we propose an extension to the venerable Long Short-Term Memory in the form of mutual gating of the current input and the previous output. This mechanism affords the modelling of a richer space of interactions between inputs and their context. Equivalently, our model can be viewed as making the transition function given by the LSTM context-dependent. Experiments demonstrate markedly improved generalization on LLMling in the range of 3-4 perplexity points on Penn Treebank and Wikitext-2, and 0.01-0.05 bpc on four character-based datasets. We establish a new state of the art on all datasets with the exception of Enwik8, where we close a large gap between the LSTM and Transformer models.

Summary

We haven't generated a summary for this paper yet.

Summarize Now

Related Papers

Regularizing and Optimizing LSTM Language Models (2017)
Language Modeling through Long Term Memory Network (2019)
Long-Short Range Context Neural Networks for Language Modeling (2017)
Gated Recurrent Neural Tensor Network (2017)
Gated Word-Character Recurrent Language Model (2016)

Tweets

https://twitter.com/GaborMelis/status/1785751568563765524