Bootstrapping LLM-based Task-Oriented Dialogue Agents via Self-Talk (2401.05033v1)
Abstract: LLMs are powerful dialogue agents, but specializing them towards fulfilling a specific function can be challenging. Instructing tuning, i.e. tuning models on instruction and sample responses generated by humans (Ouyang et al., 2022), has proven as an effective method to do so, yet requires a number of data samples that a) might not be available or b) costly to generate. Furthermore, this cost increases when the goal is to make the LLM follow a specific workflow within a dialogue instead of single instructions. Inspired by the self-play technique in reinforcement learning and the use of LLMs to simulate human agents, we propose a more effective method for data collection through LLMs engaging in a conversation in various roles. This approach generates a training data via "self-talk" of LLMs that can be refined and utilized for supervised fine-tuning. We introduce an automated way to measure the (partial) success of a dialogue. This metric is used to filter the generated conversational data that is fed back in LLM for training. Based on our automated and human evaluations of conversation quality, we demonstrate that such self-talk data improves results. In addition, we examine the various characteristics that showcase the quality of generated dialogues and how they can be connected to their potential utility as training data.
- Self-consuming generative models go mad. arXiv preprint arXiv:2307.01850.
- Out of one, many: Using language models to simulate human samples. Political Analysis, 31(3):337–351.
- Constitutional ai: Harmlessness from ai feedback. arXiv preprint arXiv:2212.08073.
- Action-based conversations dataset: A corpus for building more in-depth task-oriented dialogue systems. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 3002–3017.
- Teaching large language models to self-debug. arXiv preprint arXiv:2304.05128.
- Chataug: Leveraging chatgpt for text data augmentation. arXiv preprint arXiv:2302.13007.
- An optimal transportation approach for assessing almost stochastic order. In The Mathematics of the Uncertain, pages 33–44. Springer.
- Deep dominance - how to properly compare deep neural models. In Proceedings of the 57th Conference of the Association for Computational Linguistics, ACL 2019, Florence, Italy, July 28- August 2, 2019, Volume 1: Long Papers, pages 2773–2785. Association for Computational Linguistics.
- Hierarchical neural story generation. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, ACL 2018, Melbourne, Australia, July 15-20, 2018, Volume 1: Long Papers, pages 889–898. Association for Computational Linguistics.
- A survey on bias in deep nlp. Applied Sciences, 11(7):3184.
- Xinyang Geng and Hao Liu. 2023. Openllama: An open reproduction of llama.
- Self-verification improves few-shot clinical information extraction. arXiv preprint arXiv:2306.00024.
- Reinforced self-training (rest) for language modeling.
- Julian Hazell. 2023. Large language models can be used to effectively scale spear phishing campaigns. arXiv preprint arXiv:2305.06972.
- Debertav3: Improving deberta using electra-style pre-training with gradient-disentangled embedding sharing.
- Deberta: decoding-enhanced bert with disentangled attention. In 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net.
- The curious case of neural text degeneration. In 8th International Conference on Learning Representations, ICLR 2020, Addis Ababa, Ethiopia, April 26-30, 2020.
- Learning to write with cooperative discriminators. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, ACL 2018, Melbourne, Australia, July 15-20, 2018, Volume 1: Long Papers, pages 1638–1649. Association for Computational Linguistics.
- Lora: Low-rank adaptation of large language models. In The Tenth International Conference on Learning Representations, ICLR 2022, Virtual Event, April 25-29, 2022.
- Jennifer Hu and Roger Levy. 2023. Prompt-based methods may underestimate large language models’ linguistic generalizations. arXiv preprint arXiv:2305.13264.
- Controllable dialogue simulation with in-context learning. In Findings of the Association for Computational Linguistics: EMNLP 2022, Abu Dhabi, United Arab Emirates, December 7-11, 2022, pages 4330–4347. Association for Computational Linguistics.
- Chin-Yew Lin. 2004. Rouge: A package for automatic evaluation of summaries. In Text summarization branches out, pages 74–81.
- A generative user simulator with gpt-based architecture and goal state tracking for reinforced multi-domain dialog systems. arXiv preprint arXiv:2210.08692.
- Training socially aligned language models in simulated human society. arXiv preprint arXiv:2305.16960.
- MosaicML NLP Team. 2023. Introducing mpt-7b: A new standard for open-source, commercially usable llms. Accessed: 2023-05-05.
- Training language models to follow instructions with human feedback. Advances in Neural Information Processing Systems, 35:27730–27744.
- Generative agents: Interactive simulacra of human behavior. arXiv preprint arXiv:2304.03442.
- Refiner: Reasoning feedback on intermediate representations. arXiv preprint arXiv:2304.01904.
- Jordan Pollack and Alan Blair. 1996. Why did td-gammon work? Advances in Neural Information Processing Systems, 9.
- Language models are unsupervised multitask learners. OpenAI blog, 1(8):9.
- Direct preference optimization: Your language model is secretly a reward model. arXiv preprint arXiv:2305.18290.
- Is reinforcement learning (not) for natural language processing: Benchmarks, baselines, and building blocks for natural language policy optimization. In The Eleventh International Conference on Learning Representations, ICLR 2023, Kigali, Rwanda, May 1-5, 2023.
- Neural theory-of-mind? on the limits of social intelligence in large lms. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, EMNLP 2022, Abu Dhabi, United Arab Emirates, December 7-11, 2022, pages 3762–3780. Association for Computational Linguistics.
- Self-critiquing models for assisting human evaluators. arXiv preprint arXiv:2206.05802.
- Training language models with language feedback at scale. arXiv preprint arXiv:2303.16755.
- Mastering atari, go, chess and shogi by planning with a learned model. Nature, 588(7839):604–609.
- Bootstrapping a neural conversational agent with dialogue self-play, crowdsourcing and on-line reinforcement learning. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 3 (Industry Papers), pages 41–51.
- Building a conversational agent overnight with dialogue self-play. arXiv preprint arXiv:1801.04871.
- Model dementia: Generated data makes models forget. arXiv preprint arXiv:2305.17493.
- Deploying lifelong open-domain dialogue learning. arXiv preprint arXiv:2008.08076.
- Mastering the game of go with deep neural networks and tree search. nature, 529(7587):484–489.
- Mastering the game of go without human knowledge. nature, 550(7676):354–359.
- Karolina Stanczak and Isabelle Augenstein. 2021. A survey on gender bias in natural language processing. arXiv preprint arXiv:2112.14168.
- Gerald Tesauro. 1994. Td-gammon, a self-teaching backgammon program, achieves master-level play. Neural computation, 6(2):215–219.
- Together Computer. 2023. Redpajama-data: An open source recipe to reproduce llama training dataset.
- Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971.
- deep-significance: Easy and meaningful signifcance testing in the age of neural networks. In ML Evaluation Standards Workshop at the Tenth International Conference on Learning Representations.
- Learning to speak and act in a fantasy text adventure game. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, EMNLP-IJCNLP 2019, Hong Kong, China, November 3-7, 2019, pages 673–683. Association for Computational Linguistics.
- Michiel Van Der Ree and Marco Wiering. 2013. Reinforcement learning in the game of othello: Learning against a fixed opponent and learning from self-play. In 2013 IEEE symposium on adaptive dynamic programming and reinforcement learning (ADPRL), pages 108–115. IEEE.
- Disembodied machine learning: On the illusion of objectivity in nlp. arXiv preprint arXiv:2101.11974.
- Gpt3mix: Leveraging large-scale language models for text augmentation. In Findings of the Association for Computational Linguistics: EMNLP 2021, Virtual Event / Punta Cana, Dominican Republic, 16-20 November, 2021, pages 2225–2239. Association for Computational Linguistics.
- Sgp-tod: Building task bots effortlessly via schema-guided llm prompting. arXiv preprint arXiv:2305.09067.
- A survey of active learning for natural language processing. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, EMNLP 2022, Abu Dhabi, United Arab Emirates, December 7-11, 2022, pages 6166–6190. Association for Computational Linguistics.
- Anytod: A programmable task-oriented dialog system. arXiv preprint arXiv:2212.09939.