InstructTODS: Large Language Models for End-to-End Task-Oriented Dialogue Systems (2310.08885v1)
Abstract: LLMs have been used for diverse tasks in NLP, yet remain under-explored for task-oriented dialogue systems (TODS), especially for end-to-end TODS. We present InstructTODS, a novel off-the-shelf framework for zero-shot end-to-end task-oriented dialogue systems that can adapt to diverse domains without fine-tuning. By leveraging LLMs, InstructTODS generates a proxy belief state that seamlessly translates user intentions into dynamic queries for efficient interaction with any KB. Our extensive experiments demonstrate that InstructTODS achieves comparable performance to fully fine-tuned TODS in guiding dialogues to successful completion without prior knowledge or task-specific data. Furthermore, a rigorous human evaluation of end-to-end TODS shows that InstructTODS produces dialogue responses that notably outperform both the gold responses and the state-of-the-art TODS in terms of helpfulness, informativeness, and humanness. Moreover, the effectiveness of LLMs in TODS is further supported by our comprehensive evaluations on TODS subtasks: dialogue state tracking, intent classification, and response generation. Code and implementations could be found here https://github.com/WillyHC22/InstructTODS/
- Anonymous. 2023. Nusawrites: Constructing high-quality corpora for underrepresented and extremely low-resource languages. Anonymous preprint under review.
- Buffet: Benchmarking large language models for few-shot cross-lingual transfer.
- Large language models and the perils of their hallucinations. Critical Care, 27(1):1–2.
- Training a helpful and harmless assistant with reinforcement learning from human feedback.
- Suman Banerjee and Mitesh M Khapra. 2019. Graph convolutional network with sequential attention for goal-oriented dialogue systems. Transactions of the Association for Computational Linguistics, 7:485–500.
- A multitask, multilingual, multimodal evaluation of chatgpt on reasoning, hallucination, and interactivity. arXiv preprint arXiv:2302.04023.
- Holistic evaluation of language models. Annals of the New York Academy of Sciences.
- Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901.
- Nusacrowd: Open source initiative for indonesian nlp resources.
- Instruct-align: Teaching novel languages with to llms through alignment-based cross-lingual instruction.
- Efficient intent detection with dual sentence encoders. In Proceedings of the 2nd Workshop on Natural Language Processing for Conversational AI, pages 38–45.
- A survey on dialogue systems: Recent advances and new frontiers. SIGKDD Explor. Newsl., 19(2):25–35.
- Radostin Cholakov and Todor Kolev. 2022. Efficient task-oriented dialogue systems with response selection as an auxiliary task. In Proceedings of the 5th International Conference on Natural Language and Speech Processing (ICNLSP 2022), pages 12–18, Trento, Italy. Association for Computational Linguistics.
- Palm: Scaling language modeling with pathways. arXiv preprint arXiv:2204.02311.
- Deep reinforcement learning from human preferences. In Proceedings of the 31st International Conference on Neural Information Processing Systems, NIPS’17, page 4302–4310, Red Hook, NY, USA. Curran Associates Inc.
- Scaling instruction-finetuned language models. arXiv preprint arXiv:2210.11416.
- Michael A. Covington and Joe D. McFall. 2010. Cutting the gordian knot: The moving-average type–token ratio (MATTR). Journal of Quantitative Linguistics, 17(2):94–100.
- GlobalWoZ: Globalizing MultiWoZ to develop multilingual task-oriented dialogue systems. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 1639–1657, Dublin, Ireland. Association for Computational Linguistics.
- Multiwoz 2.1: A consolidated multi-domain dialogue dataset with state corrections and state tracking baselines. In Proceedings of the Twelfth Language Resources and Evaluation Conference, pages 422–428.
- Key-value retrieval networks for task-oriented dialogue. In Proceedings of the 18th Annual SIGdial Meeting on Discourse and Dialogue, pages 37–49.
- Bigbio: A framework for data-centric biomedical natural language processing. In Advances in Neural Information Processing Systems, volume 35, pages 25792–25806. Curran Associates, Inc.
- Neural approaches to conversational ai. In The 41st International ACM SIGIR Conference on Research & Development in Information Retrieval, pages 1371–1374. ACM.
- End-to-end neural pipeline for goal-oriented dialogue systems using gpt-2. In Proceedings of the 58th annual meeting of the association for computational linguistics, pages 583–592.
- Galaxy: A generative pre-trained model for task-oriented dialog with semi-supervised learning and explicit policy injection. Proceedings of the AAAI Conference on Artificial Intelligence.
- Measuring massive multitask language understanding. In International Conference on Learning Representations.
- Statistical dialog management applied to wfst-based dialog systems. In IEEE International Conference on Acoustics, Speech and Signal Processing, 2009. ICASSP 2009., pages 4793–4796. IEEE.
- A simple language model for task-oriented dialogue. Advances in Neural Information Processing Systems, 33:20179–20191.
- A simple language model for task-oriented dialogue. In Advances in Neural Information Processing Systems, volume 33, pages 20179–20191. Curran Associates, Inc.
- Vojtěch Hudeček and Ondřej Dušek. 2023. Are llms all you need for task-oriented dialogue? arXiv preprint arXiv:2304.06556.
- A stochastic approach to dialog management. In IEEE Workshop on Automatic Speech Recognition and Understanding, 2005., pages 226–231. IEEE.
- Survey of hallucination in natural language generation. ACM Computing Surveys, 55(12):1–38.
- Multi-lingual and multi-cultural figurative language understanding.
- AuGPT: Auxiliary tasks and data augmentation for end-to-end dialogue with pre-trained language models. In Proceedings of the 3rd Workshop on Natural Language Processing for Conversational AI, pages 198–210, Online. Association for Computational Linguistics.
- Ma-dst: Multi-attention-based scalable dialog state tracking. In Proceedings of the AAAI conference on artificial intelligence, volume 34, pages 8107–8114.
- An evaluation dataset for intent classification and out-of-scope prediction. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 1311–1316.
- Example-based dialog modeling for practical multi-domain dialog system. Speech Communication, 51(5):466–484.
- Sequicity: Simplifying task-oriented dialogue systems with single sequence-to-sequence architectures. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 1437–1447.
- Learning dialogue strategies within the markov decision process framework. In 1997 IEEE Workshop on Automatic Speech Recognition and Understanding Proceedings, pages 72–79. IEEE.
- A stochastic model of human-machine interaction for learning dialog strategies. IEEE Transactions on speech and audio processing, 8(1):11–23.
- Bart: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 7871–7880.
- Beyond the imitation game: Quantifying and extrapolating the capabilities of language models.
- Selective in-context data augmentation for intent detection using pointwise V-information. In Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics, pages 1463–1476, Dubrovnik, Croatia. Association for Computational Linguistics.
- Zero-shot dialogue state tracking via cross-task transfer. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 7890–7900.
- MinTL: Minimalist transfer learning for task-oriented dialogue systems. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 3391–3405, Online. Association for Computational Linguistics.
- Bitod: A bilingual multi-domain dataset for task-oriented dialogue modeling. arXiv preprint arXiv:2106.02787.
- Roberta: A robustly optimized bert pretraining approach.
- The flan collection: Designing data and methods for effective instruction tuning. arXiv preprint arXiv:2301.13688.
- LAVA: Latent action spaces via variational auto-encoding for dialogue policy optimization. In Proceedings of the 28th International Conference on Computational Linguistics, pages 465–479, Barcelona, Spain (Online). International Committee on Computational Linguistics.
- Learning knowledge bases with parameters for task-oriented dialogue systems. In Findings of the Association for Computational Linguistics: EMNLP 2020, pages 2372–2394.
- Attention over parameters for dialogue systems.
- Mem2seq: Effectively incorporating knowledge bases into end-to-end task-oriented dialog systems. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 1468–1478.
- Cross-lingual dialogue dataset creation via outline-based generation. Transactions of the Association for Computational Linguistics, 11:139–156.
- Philip M. McCarthy. 2005. An assessment of the range and usefulness of lexical diversity measures and the potential of the measure of textual, lexical diversity (MTLD). Ph.D. thesis, The University of Memphis.
- Philip M McCarthy and Scott Jarvis. 2007. vocd: A theoretical and empirical evaluation. Language Testing, 24(4):459–488.
- Philip M. McCarthy and Scott Jarvis. 2010. MTLD, vocd-d, and HD-d: A validation study of sophisticated approaches to lexical diversity assessment. Behavior Research Methods, 42(2):381–392.
- Structured fusion networks for dialog. In Proceedings of the 20th Annual SIGdial Meeting on Discourse and Dialogue, pages 165–177, Stockholm, Sweden. Association for Computational Linguistics.
- Masaaki Nagata and Tsuyoshi Morimoto. 1994. First steps towards statistical modeling of dialogue to predict the speech act type of the next utterance. Speech communication, 15(3-4):193–203.
- Atsumoto Ohashi and Ryuichiro Higashinaka. 2022. Post-processing networks: Method for optimizing pipeline task-oriented dialogue systems using reinforcement learning. In Proceedings of the 23rd Annual Meeting of the Special Interest Group on Discourse and Dialogue, pages 1–13.
- OpenAI. 2023. Gpt-4 technical report.
- Training language models to follow instructions with human feedback.
- Bleu: a method for automatic evaluation of machine translation. In Proceedings of the 40th annual meeting of the Association for Computational Linguistics, pages 311–318.
- Soloist: Building task bots at scale with transfer learning and machine teaching. Transactions of the Association for Computational Linguistics, 9:807–824.
- Deconstruct to reconstruct a configurable evaluation metric for open-domain dialogue systems. In Proceedings of the 28th International Conference on Computational Linguistics, pages 4164–4178, Barcelona, Spain (Online). International Committee on Computational Linguistics.
- Dynamic fusion network for multi-domain end-to-end task-oriented dialog. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 6344–6354.
- Disentangling language and knowledge in task-oriented dialogs. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 1239–1255.
- Joshua Robinson and David Wingate. 2023. Leveraging large language models for multiple choice question answering. In The Eleventh International Conference on Learning Representations.
- Multitask prompted training enables zero-shot task generalization.
- Hierarchical transformer for task oriented dialog systems. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 5649–5658, Online. Association for Computational Linguistics.
- Bloom: A 176b-parameter open-access multilingual language model. arXiv preprint arXiv:2211.05100.
- Lucas Shen. 2022. LexicalRichness: A small module to compute textual lexical richness.
- Multi-task pre-training for plug-and-play task-oriented dialogue system. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 4661–4676.
- BORT: Back and denoising reconstruction for end-to-end task-oriented dialog. In Findings of the Association for Computational Linguistics: NAACL 2022, pages 2156–2170, Seattle, United States. Association for Computational Linguistics.
- Lamda: Language models for dialog applications.
- Multi-domain dialogue acts and response co-generation. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 7125–7134, Online. Association for Computational Linguistics.
- Slot dependency modeling for zero-shot cross-domain dialogue state tracking. In Proceedings of the 29th International Conference on Computational Linguistics, pages 510–520.
- Finetuned language models are zero-shot learners. arXiv preprint arXiv:2109.01652.
- Chain of thought prompting elicits reasoning in large language models. In Advances in Neural Information Processing Systems.
- A broad-coverage challenge corpus for sentence understanding through inference. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), pages 1112–1122, New Orleans, Louisiana. Association for Computational Linguistics.
- Jason D Williams and Steve Young. 2007. Partially observable markov decision processes for spoken dialog systems. Computer Speech & Language, 21(2):393–422.
- Bloom: A 176b-parameter open-access multilingual language model.
- Transferable multi-domain state generator for task-oriented dialogue systems. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 808–819.
- Global-to-local memory pointer networks for task-oriented dialogue. In 7th International Conference on Learning Representations, ICLR 2019.
- Lamini-lm: A diverse herd of distilled models from large-scale instructions.
- UBAR: Towards fully end-to-end task-oriented dialog system with GPT-2. Proceedings of the AAAI Conference on Artificial Intelligence, 35(16):14230–14238.
- Prompting multilingual large language models to generate code-mixed texts: The case of south east asian languages.
- Few-shot intent detection via contrastive pre-training and fine-tuning. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 1906–1912, Online and Punta Cana, Dominican Republic. Association for Computational Linguistics.
- Discriminative nearest neighbor few-shot intent detection by transferring natural language inference. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 5064–5082, Online. Association for Computational Linguistics.
- Multilingual large language models are not (yet) code-switchers.
- Task-oriented dialog systems that consider multiple appropriate responses under the same context. Proceedings of the AAAI Conference on Artificial Intelligence, 34(05):9604–9611.
- Recent advances and challenges in task-oriented dialog systems. Science China Technological Sciences, 63(10):2011–2027.
- Generative encoder-decoder models for task-oriented spoken dialog systems with chatting capability. In Proceedings of the 18th Annual SIGdial Meeting on Discourse and Dialogue, pages 27–36.
- Crosswoz: A large-scale chinese cross-domain task-oriented dialogue dataset. Transactions of the Association for Computational Linguistics, 8:281–295.
- Willy Chung (11 papers)
- Samuel Cahyawijaya (75 papers)
- Bryan Wilie (24 papers)
- Holy Lovenia (30 papers)
- Pascale Fung (151 papers)