Emergent Mind

Abstract

LLMs are increasingly prevalent in conversational systems due to their advanced understanding and generative capabilities in general contexts. However, their effectiveness in task-oriented dialogues (TOD), which requires not only response generation but also effective dialogue state tracking (DST) within specific tasks and domains, remains less satisfying. In this work, we propose a novel approach FnCTOD for solving DST with LLMs through function calling. This method improves zero-shot DST, allowing adaptation to diverse domains without extensive data collection or model tuning. Our experimental results demonstrate that our approach achieves exceptional performance with both modestly sized open-source and also proprietary LLMs: with in-context prompting it enables various 7B or 13B parameter models to surpass the previous state-of-the-art (SOTA) achieved by ChatGPT, and improves ChatGPT's performance beating the SOTA by 5.6% average joint goal accuracy (JGA). Individual model results for GPT-3.5 and GPT-4 are boosted by 4.8% and 14%, respectively. We also show that by fine-tuning on a small collection of diverse task-oriented dialogues, we can equip modestly sized models, specifically a 13B parameter LLaMA2-Chat model, with function-calling capabilities and DST performance comparable to ChatGPT while maintaining their chat capabilities. We have made the code publicly available at https://github.com/facebookresearch/FnCTOD

Comparison of DST performance: domain transfer, prompting with LLMs, and versatile approach setting new benchmarks with GPT-4.

Overview

  • The novel FnCTOD approach utilizes LLMs for zero-shot dialogue state tracking (DST) by implementing function calling in conversational contexts, simplifying the deployment of conversational systems.

  • FnCTOD dramatically enhances the efficiency of both open-source and proprietary LLMs, including setting a new performance benchmark for GPT-4 in zero-shot DST and narrowing the performance gap to ChatGPT.

  • Through innovative empirical validation, FnCTOD has been shown to significantly outperform existing state-of-the-art methods in DST, showcasing substantial performance improvements across various models without needing further fine-tuning.

  • FnCTOD not only offers theoretical insights into leveraging LLMs for task-specific functions without domain-specific training but also presents practical implications for the scalable deployment of chatbots and virtual assistants.

Leveraging LLMs for Zero-shot Dialogue State Tracking via Function Calling

Introduction to FnCTOD Approach

The novel FnCTOD approach endeavors to harness the potent capabilities of LLMs for zero-shot dialogue state tracking (DST) by introducing function calling within conversational contexts. This strategy circumvents the necessity for extensive data collection and model re-training for task-oriented dialogues (TOD), addressing a significant bottleneck in deploying conversational systems across diverse domains. By embedding function specifications into the dialogues as system prompts, FnCTOD enables LLMs to generate both dialogue states and responses seamlessly, marking a critical advance in making versatile conversational systems practical and scalable.

Key Contributions and Results

The paper delineates several vital contributions through the FnCTOD methodology. Firstly, it showcases the ability of FnCTOD to significantly enhance the performance of both modestly-sized open-source and proprietary LLMs through in-context prompting. Notably, the approach sets a new benchmark by improving the performance of GPT-4 by 14%, establishing a new state-of-the-art for zero-shot DST. Moreover, it bridges the performance gap between open-source models and ChatGPT by fine-tuning a 13B LLaMA2-Chat model on a diversified set of task-oriented dialogues, thereby maintaining chat capabilities while imbuing the model with function-calling DST capacities.

Empirical Validation

The experimental validation conducted on the MultiWOZ benchmark illustrates FnCTOD’s efficacy in enhancing DST performance without further fine-tuning across various open-source and proprietary models. The approach significantly outperforms existing state-of-the-art methods, demonstrating substantial performance improvements - a 5.6% average JGA increment over the prior benchmarks with GPT-3.5 and a remarkable 14% with GPT-4. Additionally, the fine-tuned 13B parameter LLaMA2-Chat model exhibits comparable performance with ChatGPT, underscoring the approach’s utility in upgrading moderately sized models for zero-shot DST tasks.

Methodological Insights

FnCTOD redefines DST as a function calling task, effectively converting domain schemas into function specifications embedded within dialogue prompts. This novel formulation facilitates LLMs in generating function calls aligned with dialogue state requirements seamlessly. Incorporating function call decomposition and leveraging in-context prompting, the methodology distinctly improves over non-decomposed methods, emphasizing the merit of fine-tuning with a manageable dataset size for optimal zero-shot generalization capabilities.

Theoretical and Practical Implications

From a theoretical standpoint, FnCTOD advances our understanding of leveraging LLMs for task-specific functions without the stringent need for domain-specific training data, enhancing the adaptability of conversational systems. Practically, the approach paves the way for scalable and efficient deployment of chatbots and virtual assistants across myriad domains, significantly reducing the overhead associated with model training and data annotation for new domains.

Future Directions

While FnCTOD posits a robust framework for incorporating DST in TOD systems through LLMs, the pursuit towards achieving higher accuracy for practical deployment remains. Future advancements in LLM capabilities, coupled with methodological refinements in FnCTOD, are anticipated to further augment performance. Moreover, developing more realistic evaluation protocols for TOD systems, especially concerning response generation, will be crucial in realizing the full potential of such conversational models in real-world applications.

Concluding Remarks

FnCTOD represents a pivotal step forward in the quest to utilize LLMs for the dynamic and diverse realm of task-oriented dialogues. By enabling zero-shot DST through function calling, this approach mitigates significant barriers to deploying conversational systems across various domains, offering a blueprint for future innovations in the field of conversational AI.

Create an account to read this summary for free:

Newsletter

Get summaries of trending comp sci papers delivered straight to your inbox:

Unsubscribe anytime.