Finnish Language Modeling with Deep Transformer Models (2003.11562v2)

Published 14 Mar 2020 in cs.CL, cs.LG, cs.SD, eess.AS, and stat.ML

Abstract: Transformers have recently taken the center stage in LLMing after LSTM's were considered the dominant model architecture for a long time. In this project, we investigate the performance of the Transformer architectures-BERT and Transformer-XL for the LLMing task. We use a sub-word model setting with the Finnish language and compare it to the previous State of the art (SOTA) LSTM model. BERT achieves a pseudo-perplexity score of 14.5, which is the first such measure achieved as far as we know. Transformer-XL improves upon the perplexity score to 73.58 which is 27\% better than the LSTM model.

Summary

We haven't generated a summary for this paper yet.

Summarize Now

Finnish Language Modeling with Deep Transformer Models (2003.11562v2)

Summary

Related Papers