Multi-Task Pre-Training for Plug-and-Play Task-Oriented Dialogue System (2109.14739v2)

Published 29 Sep 2021 in cs.CL

Abstract: Pre-trained LLMs have been recently shown to benefit task-oriented dialogue (TOD) systems. Despite their success, existing methods often formulate this task as a cascaded generation problem which can lead to error accumulation across different sub-tasks and greater data annotation overhead. In this study, we present PPTOD, a unified plug-and-play model for task-oriented dialogue. In addition, we introduce a new dialogue multi-task pre-training strategy that allows the model to learn the primary TOD task completion skills from heterogeneous dialog corpora. We extensively test our model on three benchmark TOD tasks, including end-to-end dialogue modelling, dialogue state tracking, and intent classification. Experimental results show that PPTOD achieves new state of the art on all evaluated tasks in both high-resource and low-resource scenarios. Furthermore, comparisons against previous SOTA methods show that the responses generated by PPTOD are more factually correct and semantically coherent as judged by human annotators.

Citations (177)

View on Semantic Scholar

Summary

The paper introduces PPTOD, a unified plug-and-play model that leverages multi-task pre-training to streamline task-oriented dialogue systems.
The paper employs T5 variants and diverse multi-domain datasets to achieve superior performance in end-to-end dialogue modeling and intent classification.
The paper reduces error propagation and annotation costs by decoupling dialogue sub-tasks with task-specific prompts, enabling scalability in low-resource setups.

Overview of Multi-Task Pre-Training for Task-Oriented Dialogue Systems

The paper presents PPTOD, a unified plug-and-play model designed for task-oriented dialogue (TOD) systems. The primary motivation behind the paper is to address the limitations in existing TOD approaches, which largely depend on a cascaded generation framework. This traditional framework can result in error propagation across various sub-tasks, such as dialogue state tracking (DST), policy learning (POL), and natural language generation (NLG), and also involve extensive data annotation workloads. PPTOD seeks to innovate by offering a unified architecture facilitated by a dialogue multi-task pre-training strategy.

Methodology and Key Innovations

PPTOD targets the integration of dialogue modules into a single neural architecture, leveraging pre-trained LLMs (PLMs) to eliminate the need for manual annotation across all sub-tasks. This is achieved through a multi-task pre-training approach, wherein the model is trained across a diverse set of TOD-related tasks, enabling it to derive skills from partially annotated datasets. The essence of PPTOD lies in a plug-and-play framework that decouples sub-domains via task-specific prompts, enabling parallel generation processes.

The paper adopts T5 variants (small, base, large) to initialize PPTOD and involves pre-training using a heterogeneous dialog corpora comprising over 2.3 million utterances from 80 domains. Eleven curated datasets with varied annotations are utilized to simulate different TOD sub-tasks including NLU, DST, POL, and NLG.

Experimental Evaluation

PPTOD is evaluated against several benchmark datasets, primarily focusing on MultiWOZ 2.0 and 2.1 for end-to-end dialogue modeling, DST, and user intent classification tasks.

Numerical Results

End-to-End Dialogue Modeling: PPTOD demonstrates superior performance in full-data conditions for MultiWOZ by achieving high Inform, Success, BLEU, and Combined scores. Particularly, the model yields notable improvements in low-resource setups (as minimal as 1% of training data), outperforming baselines by substantial margins.
Dialogue State Tracking: Although classification-based models slightly outperform PPTOD in joint goal accuracy, PPTOD’s generation-based approach offers more scalability, adapting effortlessly to new ontology labels.
Intent Classification: The model exhibits robust accuracy both in limited and full training scenarios, underscoring its efficiency in task-oriented dialogues without necessitating extra parameters for new tasks.

Implications and Future Directions

The paper underscores the transformative potential of employing a unified model like PPTOD for TOD tasks. By effectively reducing inference latency and minimizing error accumulation typically observed in cascaded methods, PPTOD sets a precedence for future research to explore unsupervised and few-shot learning paradigms within TOD systems. The implications are particularly significant in real-world applications where frequent ontology updates necessitate adaptive dialogue models. Additionally, the methodological insights into task-specific prompt utilization could inspire innovations in multilingual and cross-domain dialogue systems.

Future research may delve into enhancing the model’s understanding capabilities by integrating more refined NLU modules or exploring semi-supervised learning pathways to optimize the performance under scarce data conditions. The theoretical foundation established by PPTOD promises scalable dialogue systems capable of sustaining complex, multi-domain conversations, thus paving avenues for robust conversational agents.

PDF Markdown