RoCo: Dialectic Multi-Robot Collaboration with Large Language Models (2307.04738v1)
Abstract: We propose a novel approach to multi-robot collaboration that harnesses the power of pre-trained LLMs for both high-level communication and low-level path planning. Robots are equipped with LLMs to discuss and collectively reason task strategies. They then generate sub-task plans and task space waypoint paths, which are used by a multi-arm motion planner to accelerate trajectory planning. We also provide feedback from the environment, such as collision checking, and prompt the LLM agents to improve their plan and waypoints in-context. For evaluation, we introduce RoCoBench, a 6-task benchmark covering a wide range of multi-robot collaboration scenarios, accompanied by a text-only dataset for agent representation and reasoning. We experimentally demonstrate the effectiveness of our approach -- it achieves high success rates across all tasks in RoCoBench and adapts to variations in task semantics. Our dialog setup offers high interpretability and flexibility -- in real world experiments, we show RoCo easily incorporates human-in-the-loop, where a user can communicate and collaborate with a robot agent to complete tasks together. See project website https://project-roco.github.io for videos and code.
- OpenAI. Gpt-4 technical report. ArXiv, abs/2303.08774, 2023.
- Simple open-vocabulary object detection with vision transformers, 2022.
- A. LLC. Introducing claude, 2023. URL https://www.anthropic.com/index/introducing-claude.
- Do as i can, not as i say: Grounding language in robotic affordances, 2022.
- Inner monologue: Embodied reasoning through planning with language models. In Conference on Robot Learning, 2022.
- Code as policis: Language model programs for embodied control. In arXiv preprint arXiv:2209.07753, 2022.
- Progprompt: Generating situated robot task plans using large language models. arXiv preprint arXiv:2209.11302, 2022.
- Demo2code: From summarizing demonstrations to synthesizing code via extended chain-of-thought. arXiv preprint arXiv:2305.16744, 2023.
- Tidybot: Personalized robot assistance with large language models. arXiv preprint arXiv:2305.05658, 2023.
- Instruct2act: Mapping multi-modality instructions to robotic actions with large language model. arXiv preprint arXiv:2305.11176, 2023.
- Text2motion: From natural language instructions to feasible plans. arXiv preprint arXiv:2303.12153, 2023.
- Autotamp: Autoregressive task and motion planning with llms as translators and checkers. arXiv preprint arXiv:2306.06531, 2023.
- Task and motion planning with large language models for object rearrangement. arXiv preprint arXiv:2303.06247, 2023.
- Visually grounded task and motion planning for mobile manipulation. In 2022 International Conference on Robotics and Automation (ICRA), pages 1925–1931. IEEE, 2022.
- ” no, to the right”–online language corrections for robotic manipulation via shared autonomy. arXiv preprint arXiv:2301.02555, 2023.
- Reward design with language models. arXiv preprint arXiv:2303.00001, 2023.
- Language to rewards for robotic skill synthesis. arXiv preprint arXiv:2306.08647, 2023.
- Palm-e: An embodied multimodal language model. arXiv preprint arXiv:2303.03378, 2023.
- Interactive language: Talking to robots in real time. arXiv preprint arXiv:2210.06407, 2022.
- Robotic skill acquisition via instruction augmentation with vision-language models. arXiv preprint arXiv:2211.11736, 2022.
- Learning transferable visual models from natural language supervision. In International conference on machine learning, pages 8748–8763. PMLR, 2021.
- Socratic models: Composing zero-shot multimodal reasoning with language, 2022.
- Chat with the environment: Interactive multimodal perception using large language models. arXiv preprint arXiv:2303.08268, 2023.
- Toward grounded social reasoning. arXiv preprint arXiv:2306.08651, 2023.
- Alphablock: Embodied finetuning for vision-language reasoning in robot manipulation, 2023.
- Code as policies: Language model programs for embodied control. arXiv preprint arXiv:2209.07753, 2022.
- J. Andreas. Language models as agent models. arXiv preprint arXiv:2212.01681, 2022.
- D. Schlangen. Dialogue games for benchmarking language understanding: Motivation, taxonomy, strategy, 2023.
- clembench: Using game play to evaluate chat-optimized language models as conversational agents, 2023.
- Generative agents: Interactive simulacra of human behavior, 2023.
- Camel: Communicative agents for ”mind” exploration of large scale language model society. ArXiv, abs/2303.17760, 2023.
- Training socially aligned language models in simulated human society, 2023.
- Ai safety via debate, 2018.
- Dera: Enhancing large language model completions with dialog-enabled resolving agents. arXiv preprint arXiv:2303.17071, 2023.
- Encouraging divergent thinking in large language models through multi-agent debate. ArXiv, abs/2305.19118, 2023.
- Improving factuality and reasoning in language models through multiagent debate, 2023.
- Y. Koga and J.-C. Latombe. On multi-arm manipulation planning. Proceedings of the 1994 IEEE International Conference on Robotics and Automation, pages 945–952 vol.2, 1994.
- S. Karaman and E. Frazzoli. Sampling-based algorithms for optimal motion planning, 2011.
- A. Dobson and K. E. Bekris. Planning representations and algorithms for prehensile multi-arm manipulation. In 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 6381–6386. IEEE, 2015.
- Learning a decentralized multi-arm motion planner. In Conference on Robotic Learning (CoRL), 2020.
- Coordinated multi-arm motion planning: Reaching for moving objects in the face of uncertainty. In Robotics: Science and Systems, 2016.
- Randomized path planning for linkages with closed kinematic chains. Robotics and Automation, IEEE Transactions on, 17:951 – 958, 01 2002. doi:10.1109/70.976030.
- Closed-chain manipulation of large objects by multi-arm robotic systems. IEEE Robotics and Automation Letters, 2(4):1832–1839, 2017.
- Multi-robot grasp planning for sequential assembly operations. In 2015 IEEE International Conference on Robotics and Automation (ICRA), pages 193–200, 2015. doi:10.1109/ICRA.2015.7138999.
- Mujoco: A physics engine for model-based control. In 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pages 5026–5033, 2012. doi:10.1109/IROS.2012.6386109.
- dm_control: Software and tasks for continuous control. Software Impacts, 6:100022, 2020.
- M. M. Contributors. MuJoCo Menagerie: A collection of high-quality simulation models for MuJoCo, 2022. URL http://github.com/deepmind/mujoco_menagerie.
- Learning dexterous manipulation from exemplar object trajectories and pre-grasps. In IEEE International Conference on Robotics and Automation 2023, 2023.
Collections
Sign up for free to add this paper to one or more collections.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.