Don't Say That! Making Inconsistent Dialogue Unlikely with Unlikelihood Training

Published 10 Nov 2019 in cs.CL | (1911.03860v2)

Abstract: Generative dialogue models currently suffer from a number of problems which standard maximum likelihood training does not address. They tend to produce generations that (i) rely too much on copying from the context, (ii) contain repetitions within utterances, (iii) overuse frequent words, and (iv) at a deeper level, contain logical flaws. In this work we show how all of these problems can be addressed by extending the recently introduced unlikelihood loss (Welleck et al., 2019) to these cases. We show that appropriate loss functions which regularize generated outputs to match human distributions are effective for the first three issues. For the last important general issue, we show applying unlikelihood to collected data of what a model should not do is effective for improving logical consistency, potentially paving the way to generative models with greater reasoning ability. We demonstrate the efficacy of our approach across several dialogue tasks.

Abstract PDF Upgrade to Chat

Citations (171)

View on Semantic Scholar

Summary

The paper introduces unlikelihood training to penalize undesirable outputs in dialogue generation, directly addressing issues like repetition and context copying.
It demonstrates significant improvements by reducing repetition rates (up to 89%) and shifting token usage towards a more human-like distribution.
The study leverages NLI-based contradiction handling to enhance logical consistency in dialogue, setting the stage for more coherent AI conversational systems.

Unlikelihood Training in Dialogue Generation: An Analytical Approach

The paper "Don't Say That! Making Inconsistent Dialogue Unlikely with Unlikelihood Training" addresses fundamental issues in generative dialogue models by introducing the concept of unlikelihood training. The authors identify prevalent problems in neural text generation, such as excessive reliance on context copying, repetition within utterances, the overuse of frequent words, and logical inconsistencies. These challenges are attributed to the inadequacies of standard maximum likelihood training, which fails to model language generation in a human-like manner.

Generative dialogue models often produce text that is repetitive or inconsistent, undermining their ability to generate dialogues that are coherent and display reasoning abilities. To mitigate these issues, this paper proposes an extension of the unlikelihood loss, initially introduced for reducing repetitions, to various dialogue generation challenges. Unlike standard likelihood training which maximizes the probability of observed data, unlikelihood training incorporates a loss component that actively penalizes undesirable outputs, such as repetitions and contradictions, effectively lowering their probability.

Key Contributions

Repetition and Copying: The authors extend unlikelihood training to decrease context duplication and within-utterance repetitions. They implement this by penalizing repetitive n-grams from the training data and generated dialogue, thereby aligning the model output more closely with human dialogue patterns.
Vocabulary Usage Regulation: This paper extends unlikelihood training to address the overuse of high-frequency words and the underutilization of less common words. By penalizing tokens that contribute to this imbalance, the authors demonstrate a marked improvement in the token distribution of model-generated sequences, moving towards human distribution patterns.
Handling Contradictions: To improve logical consistency, the paper adopts existing datasets like Natural Language Inference (NLI) to label coherence and incoherence in dialog pairs. Such datasets are used within the unlikelihood framework to train models that better understand and maintain logical dialogue consistency.

Experimental Results

The authors conducted experiments across various dialogue tasks including persona-based dialogue (ConvAI2), knowledge-grounded dialogue (Wizard of Wikipedia), and long-form question answering (ELI5). Results show significant reductions in repetition metrics, improved vocabulary variety, and enhanced dialogue consistency. Specific improvements are quantified as follows:

ConvAI2 repetition metrics saw a 69% reduction in context repetition and an 89% reduction in label repetition.
Vocabulary control showed an effective shift of frequency distribution towards rarer words, better matching human standards.
NLI-based contradiction handling raised selection accuracy notably, improving decision-making about coherent versus incoherent dialogue pairs.

Implications and Future Directions

The implications of unlikelihood training are profound for AI-driven conversational agents. By integrating a mechanism that penalizes unlikely or undesirable dialogue sequences, better dialogue flow and reasoning is achieved, enhancing user interactions with AI systems. Practical applications include AI chatbots designed for customer service, where coherent interaction is crucial.

The authors suggest further research avenues could explore unlikelihood training in areas like causal and commonsense reasoning, leveraging datasets such as HellaSwag and other logic-centered corpora. Such expansions may foster AI systems capable of more nuanced understanding and interaction in conversational contexts, advancing both the naturalness and functionality of AI communication systems.

In summary, this paper provides a substantial method for refining dialogue generation models by addressing inherent weaknesses in existing training paradigms, setting a standard for future advancements in the field of natural language processing and AI dialogue systems.

Markdown Report Issue