Neural Text Generation with Unlikelihood Training

Published 12 Aug 2019 in cs.LG, cs.CL, and stat.ML | (1908.04319v2)

Abstract: Neural text generation is a key tool in natural language applications, but it is well known there are major problems at its core. In particular, standard likelihood training and decoding leads to dull and repetitive outputs. While some post-hoc fixes have been proposed, in particular top-$k$ and nucleus sampling, they do not address the fact that the token-level probabilities predicted by the model are poor. In this paper we show that the likelihood objective itself is at fault, resulting in a model that assigns too much probability to sequences containing repeats and frequent words, unlike those from the human training distribution. We propose a new objective, unlikelihood training, which forces unlikely generations to be assigned lower probability by the model. We show that both token and sequence level unlikelihood training give less repetitive, less dull text while maintaining perplexity, giving superior generations using standard greedy or beam search. According to human evaluations, our approach with standard beam search also outperforms the currently popular decoding methods of nucleus sampling or beam blocking, thus providing a strong alternative to existing techniques.

Abstract PDF Upgrade to Chat

Authors (6)

Citations (529)

View on Semantic Scholar

Summary

The paper introduces unlikelihood training to penalize repetitive and irrelevant tokens, addressing limitations of likelihood-based models.
It applies both token-level and sequence-level training to reduce n-gram repetition and enhance output diversity.
Empirical evaluations on Wikitext-103 demonstrate improved sequence diversity and reduced repetition through the new training approach.

Neural Text Degeneration with Unlikelihood Training

The paper "Neural Text Degeneration with Unlikelihood Training" addresses the well-known issues associated with neural text generation models, specifically the propensity for generating dull and repetitive outputs. The primary objective is to propose a novel training approach that directly tackles the flaws inherent in the standard likelihood objective used in LLMs.

Core Contributions

The research identifies that standard likelihood training often results in models assigning higher probabilities to sequences containing repetitive and frequent words, diverging from human-generated text distributions. To counter this, the authors introduce a new training paradigm termed unlikelihood training. This approach involves not only promoting the likelihood of the true target tokens but also demoting the probabilities of irrelevant or repetitive tokens.

The unlikelihood training is applied at two levels:

Token-Level Unlikelihood Training: It adjusts token probabilities during the training of sequences, discouraging the model from predicting previously observed context tokens.
Sequence-Level Unlikelihood Training: It involves penalizing the model for generating sequences with repetitive n-grams gathered from model outputs, thereby improving the diversity and naturalness of the text.

Methodology

The paper's method utilizes a Transformer-based architecture and applies the unlikelihood component by constructing a loss that combines both likely and unlikely token updates. The research conducts empirical evaluations using the Wikitext-103 dataset, demonstrating improvements in both token diversity and reduction of repetition when applying unlikelihood training.

Numerical Results and Evaluation

Strong numerical results showcased less repetitive and more varied outputs. Specific metrics such as sequence-level repetition (seq-rep) and next-token prediction accuracy are highlighted:

The seq-rep metric for models employing unlikelihood training dropped significantly compared to the baseline, indicating more diverse sequence generation.
Human evaluations align with the automatic metrics, favoring the generations from models using the proposed approach over traditional likelihood-based models and popular decoding strategies like nucleus sampling.

Implications and Future Directions

The implications of this work are considerable both in practice and theory. This approach could replace or augment current generation techniques across various applications, from chatbots to automated content creation, by producing more engaging and human-like text.

Future work might explore integrating unlikelihood training with other architectural adaptations or in more complex multitask settings. Additionally, expanding its application beyond typical LLMs to tasks like summarization or translation could demonstrate broader utility.

The paper contributes a critical step forward in refining the quality of neural text generation by focusing on the inherent limitations of current training objectives, paving the way for future advancements in AI-driven language technologies.

Markdown Report Issue