Leveraging User Simulation to Develop and Evaluate Conversational Information Access Agents (2312.08041v1)
Abstract: We observe a change in the way users access information, that is, the rise of conversational information access (CIA) agents. However, the automatic evaluation of these agents remains an open challenge. Moreover, the training of CIA agents is cumbersome as it mostly relies on conversational corpora, expert knowledge, and reinforcement learning. User simulation has been identified as a promising solution to tackle automatic evaluation and has been previously used in reinforcement learning. In this research, we investigate how user simulation can be leveraged in the context of CIA. We organize the work in three parts. We begin with the identification of requirements for user simulators for training and evaluating CIA agents and compare existing types of simulator regarding these. Then, we plan to combine these different types of simulators into a new hybrid simulator. Finally, we aim to extend simulators to handle more complex information seeking scenarios.
- UserSimCRS: A User Simulation Toolkit for Evaluating Conversational Recommender Systems. In Proc. of WSDM ’23. 1160––1163.
- Krisztian Balog. 2021. Conversational AI from an Information Retrieval Perspective: Remaining Challenges and a Case for User Simulation. In Proc. of DESIRES ’21. 80–90.
- Krisztian Balog and ChengXiang Zhai. 2023. User Simulation for Evaluating Information Access Systems. arXiv:2306.08550 [cs.HC]
- Nolwenn Bernard and Krisztian Balog. 2023. MG-ShopDial: A Multi-Goal Conversational Dataset for e-Commerce. In Proc. of SIGIR ’23. 2775–2785.
- Research Frontiers in Information Retrieval: Report from the Third Strategic Workshop on Information Retrieval in Lorne (SWIRL 2018). SIGIR Forum 52, 1 (2018), 34––90.
- User Simulation with Large Language Models for Evaluating Task-Oriented Dialogue. arXiv:2309.13233 [cs.CL]
- User modeling for spoken dialogue system evaluation. In Proc. of ASRU ’97. 80–87.
- User Modeling for Task Oriented Dialogues. In Proc. of SLT ’18. 900–906.
- IAI MovieBot: A Conversational Movie Recommender System. In Proc. of CIKM ’20. 3405––3408.
- GenTUS: Simulating User Behaviour and Language in Task-oriented Dialogues with Generative Transformers. In Proc. of SIGDIAL ’22. 270–282.
- Domain-independent User Simulation with Transformers for Task-oriented Dialogue Systems. In Proc. of SIGDIAL ’21. 445–456.
- Exploiting Simulated User Feedback for Conversational Search: Ranking, Rewriting, and Beyond. In Proc. of SIGIR ’23. 632––642.
- Agenda-Based User Simulation for Bootstrapping a POMDP Dialogue System. In Proc. of NAACL ’07. 149–152.
- A survey of statistical user simulation techniques for reinforcement-learning of dialogue management strategies. Knowl. Eng. Rev. 21, 2 (2006), 97––126.
- Evaluating Mixed-Initiative Conversational Search Systems via User Simulation. In Proc. of WSDM ’22. 888––896.
- Metaphorical User Simulators for Evaluating Task-Oriented Dialogue Systems. ACM Trans. Inf. Syst. 42, 1 (2023), 1–29.
- Transferable Dialogue Systems and User Simulators. In Proc. of ACL IJCNLP ’21. 152–166.
- PyDial: A Multi-domain Statistical Dialogue System Toolkit. In Proc. of ACL 2017. 73–78.
- Shuo Zhang and Krisztian Balog. 2020. Evaluating Conversational Recommender Systems via User Simulation. In Proc. of KDD ’20. 1512–1520.
- ConvLab-3: A Flexible Dialogue System Toolkit Based on a Unified Data Format. arXiv:2211.17148 [cs.CL]