Papers
Topics
Authors
Recent
Detailed Answer
Quick Answer
Concise responses based on abstracts only
Detailed Answer
Well-researched responses based on abstracts and relevant paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses
Gemini 2.5 Flash
Gemini 2.5 Flash 72 tok/s
Gemini 2.5 Pro 57 tok/s Pro
GPT-5 Medium 43 tok/s Pro
GPT-5 High 23 tok/s Pro
GPT-4o 107 tok/s Pro
Kimi K2 219 tok/s Pro
GPT OSS 120B 465 tok/s Pro
Claude Sonnet 4 39 tok/s Pro
2000 character limit reached

On the Multi-turn Instruction Following for Conversational Web Agents (2402.15057v1)

Published 23 Feb 2024 in cs.CL and cs.AI

Abstract: Web agents powered by LLMs have demonstrated remarkable abilities in planning and executing multi-step interactions within complex web-based environments, fulfilling a wide range of web navigation tasks. Despite these advancements, the potential for LLM-powered agents to effectively engage with sequential user instructions in real-world scenarios has not been fully explored. In this work, we introduce a new task of Conversational Web Navigation, which necessitates sophisticated interactions that span multiple turns with both the users and the environment, supported by a specially developed dataset named Multi-Turn Mind2Web (MT-Mind2Web). To tackle the limited context length of LLMs and the context-dependency issue of the conversational tasks, we further propose a novel framework, named self-reflective memory-augmented planning (Self-MAP), which employs memory utilization and self-reflection techniques. Extensive experiments are conducted to benchmark the MT-Mind2Web dataset, and validate the effectiveness of the proposed method.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (41)
  1. Context aware query rewriting for text rankers using LLM. CoRR, abs/2308.16753.
  2. Self-rag: Learning to retrieve, generate, and critique through self-reflection. In ICLR 2024.
  3. Open question answering over tables and text. In ICLR 2021.
  4. Scaling instruction-finetuned language models. CoRR, abs/2210.11416.
  5. Educhat: A large-scale language model-based chatbot system for intelligent education. CoRR, abs/2308.02773.
  6. Mind2web: Towards a generalist agent for the web. In NeurIPS 2023.
  7. PACIFIC: towards proactive conversational question answering over tabular and textual data in finance. In EMNLP 2022, pages 6970–6984.
  8. Plug-and-play policy planner for large language model powered dialogue agents. In ICLR 2024.
  9. A real-world webagent with planning, long context understanding, and program synthesis. In ICLR 2024.
  10. Webvoyager: Building an end-to-end web agent with large multimodal models. CoRR, abs/2401.13919.
  11. Deberta: decoding-enhanced bert with disentangled attention. In 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net.
  12. Large language models as zero-shot conversational recommenders. In Proceedings of the 32nd ACM International Conference on Information and Knowledge Management, CIKM 2023, Birmingham, United Kingdom, October 21-25, 2023, pages 720–730. ACM.
  13. Metagpt: Meta programming for multi-agent collaborative framework. In ICLR 2024.
  14. Recommender AI agent: Integrating large language models for interactive recommendations. CoRR, abs/2308.16505.
  15. "what’s important here?": Opportunities and challenges of using llms in retrieving information from web interfaces. CoRR, abs/2312.06147.
  16. Language models can solve computer tasks. In NeurIPS 2023.
  17. MMCoQA: Conversational question answering over text, tables, and images. In ACL 2022, pages 4220–4231.
  18. Reinforcement learning on web interfaces using workflow-guided exploration. In 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings. OpenReview.net.
  19. Agentbench: Evaluating llms as agents. In ICLR 2024.
  20. Agentboard: An analytical evaluation board of multi-turn llm agents.
  21. Sahisnu Mazumder and Oriana Riva. 2021. FLIN: A flexible natural language interface for web navigation. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2021, Online, June 6-11, 2021, pages 2777–2788. Association for Computational Linguistics.
  22. HybriDialogue: An information-seeking dialogue dataset grounded on tabular and textual data. In Findings of ACL: ACL 2022, pages 481–492.
  23. Kwaiagents: Generalized information-seeking agent system with large language models. CoRR, abs/2312.04889.
  24. World of bits: An open-domain platform for web-based agents. In Proceedings of the 34th International Conference on Machine Learning, ICML 2017, Sydney, NSW, Australia, 6-11 August 2017, volume 70 of Proceedings of Machine Learning Research, pages 3135–3144. PMLR.
  25. Reflexion: an autonomous agent with dynamic memory and self-reflection. In NeurIPS 2023.
  26. Alfworld: Aligning text and embodied environments for interactive learning. In 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net.
  27. Adaplanner: Adaptive planning from feedback with language models. In NeurIPS 2023.
  28. Multimodal{qa}: complex question answering over text, tables and images. In ICLR 2021.
  29. A survey on large language model based autonomous agents. CoRR, abs/2308.11432.
  30. MINT: evaluating llms in multi-turn interaction with tools and language feedback. In ICLR 2024.
  31. Michael Wooldridge and Nicholas R Jennings. 1995. Intelligent agents: Theory and practice. The knowledge engineering review, 10(2):115–152.
  32. The rise and potential of large language model based agents: A survey. CoRR, abs/2309.07864.
  33. Openagents: An open platform for language agents in the wild. CoRR, abs/2310.10634.
  34. Lemur: Harmonizing natural language and code for language agents. In ICLR 2024.
  35. Webshop: Towards scalable real-world web interaction with grounded language agents. In Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, NeurIPS 2022, New Orleans, LA, USA, November 28 - December 9, 2022.
  36. Gpt-4v(ision) is a generalist web agent, if grounded. CoRR, abs/2401.01614.
  37. Judging llm-as-a-judge with mt-bench and chatbot arena. In NeurIPS 2023.
  38. Synapse: Leveraging few-shot exemplars for human-level computer control. In ICLR 2024.
  39. Building emotional support chatbots in the era of llms. CoRR, abs/2308.11584.
  40. Webarena: A realistic web environment for building autonomous agents.
  41. TAT-QA: A question answering benchmark on a hybrid of tabular and textual content in finance. In ACL/IJCNLP 2021, pages 3277–3287.
Citations (9)
List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

Summary

We haven't generated a summary for this paper yet.

Dice Question Streamline Icon: https://streamlinehq.com

Follow-Up Questions

We haven't generated follow-up questions for this paper yet.