Towards Scalable Multi-domain Conversational Agents: The Schema-Guided Dialogue Dataset

Published 12 Sep 2019 in cs.CL | (1909.05855v2)

Abstract: Virtual assistants such as Google Assistant, Alexa and Siri provide a conversational interface to a large number of services and APIs spanning multiple domains. Such systems need to support an ever-increasing number of services with possibly overlapping functionality. Furthermore, some of these services have little to no training data available. Existing public datasets for task-oriented dialogue do not sufficiently capture these challenges since they cover few domains and assume a single static ontology per domain. In this work, we introduce the the Schema-Guided Dialogue (SGD) dataset, containing over 16k multi-domain conversations spanning 16 domains. Our dataset exceeds the existing task-oriented dialogue corpora in scale, while also highlighting the challenges associated with building large-scale virtual assistants. It provides a challenging testbed for a number of tasks including language understanding, slot filling, dialogue state tracking and response generation. Along the same lines, we present a schema-guided paradigm for task-oriented dialogue, in which predictions are made over a dynamic set of intents and slots, provided as input, using their natural language descriptions. This allows a single dialogue system to easily support a large number of services and facilitates simple integration of new services without requiring additional training data. Building upon the proposed paradigm, we release a model for dialogue state tracking capable of zero-shot generalization to new APIs, while remaining competitive in the regular setting.

Abstract PDF Upgrade to Chat

Authors (5)

Citations (574)

View on Semantic Scholar

Summary

The paper introduces a comprehensive multi-domain dataset spanning 16 domains with over 16,000 dialogues to advance scalable task-oriented systems.
It employs a schema-guided paradigm using natural language service descriptions to dynamically integrate new APIs through zero-shot dialogue state tracking.
Experimental results showcase competitive performance with key metrics, setting a benchmark for future research in dialogue system scalability.

Overview of the Schema-Guided Dialogue Dataset

The paper, “Towards Scalable Multi-Domain Conversational Agents: The Schema-Guided Dialogue Dataset” by Rastogi et al., introduces a comprehensive framework and dataset for developing scalable conversational agents capable of handling multiple domains. The Schema-Guided Dialogue (SGD) dataset is highlighted as a benchmark for advancing dialogue models, enabling zero-shot generalization across new APIs and domains—a significant contribution to task-oriented dialogue systems.

Core Contributions

The key contributions of the paper focus on addressing the limitations of existing dialogue datasets by offering:

Extensive Multi-Domain Coverage: The SGD dataset spans a diverse array of 16 domains, significantly surpassing existing corpora in dialogue breadth and depth, with over 16,000 conversations. This diversity is crucial for training models that can adapt to various tasks across different services.
Schema-Guided Paradigm: The paper advocates for a schema-guided approach to task-oriented dialogue, wherein dialogue models operate based on a dynamic schema of intents and slots defined at runtime. This mechanism allows the seamless addition of new services without extensive retraining, leveraging natural language descriptions to interpret service schemas.
Zero-Shot Dialogue State Tracking: The authors present a dialogue state tracking model that exploits large pre-trained LLMs like BERT for zero-shot generalization. This model can adapt to unseen domains and APIs, showcasing competitive performance in both constrained and unconstrained settings.

Methodological Insights

The SGD dataset addresses several challenges associated with real-world conversational AI by proposing a novel dialogue collection technique. A simulation-based framework pairs synthesized dialogues with crowd-sourced paraphrasing to create natural, annotated conversations. These conversations maintain close alignment with potential user interaction flows in real-world applications, ensuring both naturalness and complexity.

Key aspects of the methodology include:

Service Schemas: Defined schemas encapsulate the parameters for each service, including intents and slots, catering for variations and constraints typical of real API interactions.
Simulation Framework: Automated simulations generate structured dialogue flows, which are then paraphrased by crowd workers to produce natural language dialogues. This approach minimizes annotation errors and reduces collection costs.
Scalability Focus: The schema-guided paradigm enables dynamic interpretation and handling of service descriptions, positioning the model for scalability across an extensive network of APIs.

Results and Evaluation

The efficacy of the proposed framework is demonstrated through an extensive evaluation with significant metrics such as Active Intent Accuracy, Requested Slot F1, Goal Accuracy, and Joint Goal Accuracy. In comparative experiments, the prototype model achieves competitive results, asserting its capability to manage dialogues across both new and familiar service domains effectively.

Implications and Future Directions

The release of the SGD dataset provides a valuable resource for the development of more adaptable and robust dialogue systems. It has direct implications for the practical deployment of virtual assistants in increasingly diverse and dynamic environments. Additionally, the schema-guided approach marks a shift toward more flexible system architectures, allowing for seamless integration of new domains and APIs—essential for the future expansion of conversational AI systems.

This research lays a foundation for further exploration into zero-shot learning and robust dialogue state tracking, encouraging the development of universality in dialogue systems. Future work may focus on enriching model architectures to enhance semantic understanding and response generation, fostering improved human-computer interactions across even broader contexts and applications.

Markdown Report Issue