RoCo: Dialectic Multi-Robot Collaboration with Large Language Models (2307.04738v1)
Abstract: We propose a novel approach to multi-robot collaboration that harnesses the power of pre-trained LLMs for both high-level communication and low-level path planning. Robots are equipped with LLMs to discuss and collectively reason task strategies. They then generate sub-task plans and task space waypoint paths, which are used by a multi-arm motion planner to accelerate trajectory planning. We also provide feedback from the environment, such as collision checking, and prompt the LLM agents to improve their plan and waypoints in-context. For evaluation, we introduce RoCoBench, a 6-task benchmark covering a wide range of multi-robot collaboration scenarios, accompanied by a text-only dataset for agent representation and reasoning. We experimentally demonstrate the effectiveness of our approach -- it achieves high success rates across all tasks in RoCoBench and adapts to variations in task semantics. Our dialog setup offers high interpretability and flexibility -- in real world experiments, we show RoCo easily incorporates human-in-the-loop, where a user can communicate and collaborate with a robot agent to complete tasks together. See project website https://project-roco.github.io for videos and code.
- OpenAI. Gpt-4 technical report. ArXiv, abs/2303.08774, 2023.
- Simple open-vocabulary object detection with vision transformers, 2022.
- A. LLC. Introducing claude, 2023. URL https://www.anthropic.com/index/introducing-claude.
- Do as i can, not as i say: Grounding language in robotic affordances, 2022.
- Inner monologue: Embodied reasoning through planning with language models. In Conference on Robot Learning, 2022.
- Code as policis: Language model programs for embodied control. In arXiv preprint arXiv:2209.07753, 2022.
- Progprompt: Generating situated robot task plans using large language models. arXiv preprint arXiv:2209.11302, 2022.
- Demo2code: From summarizing demonstrations to synthesizing code via extended chain-of-thought. arXiv preprint arXiv:2305.16744, 2023.
- Tidybot: Personalized robot assistance with large language models. arXiv preprint arXiv:2305.05658, 2023.
- Instruct2act: Mapping multi-modality instructions to robotic actions with large language model. arXiv preprint arXiv:2305.11176, 2023.
- Text2motion: From natural language instructions to feasible plans. arXiv preprint arXiv:2303.12153, 2023.
- Autotamp: Autoregressive task and motion planning with llms as translators and checkers. arXiv preprint arXiv:2306.06531, 2023.
- Task and motion planning with large language models for object rearrangement. arXiv preprint arXiv:2303.06247, 2023.
- Visually grounded task and motion planning for mobile manipulation. In 2022 International Conference on Robotics and Automation (ICRA), pages 1925–1931. IEEE, 2022.
- ” no, to the right”–online language corrections for robotic manipulation via shared autonomy. arXiv preprint arXiv:2301.02555, 2023.
- Reward design with language models. arXiv preprint arXiv:2303.00001, 2023.
- Language to rewards for robotic skill synthesis. arXiv preprint arXiv:2306.08647, 2023.
- Palm-e: An embodied multimodal language model. arXiv preprint arXiv:2303.03378, 2023.
- Interactive language: Talking to robots in real time. arXiv preprint arXiv:2210.06407, 2022.
- Robotic skill acquisition via instruction augmentation with vision-language models. arXiv preprint arXiv:2211.11736, 2022.
- Learning transferable visual models from natural language supervision. In International conference on machine learning, pages 8748–8763. PMLR, 2021.
- Socratic models: Composing zero-shot multimodal reasoning with language, 2022.
- Chat with the environment: Interactive multimodal perception using large language models. arXiv preprint arXiv:2303.08268, 2023.
- Toward grounded social reasoning. arXiv preprint arXiv:2306.08651, 2023.
- Alphablock: Embodied finetuning for vision-language reasoning in robot manipulation, 2023.
- Code as policies: Language model programs for embodied control. arXiv preprint arXiv:2209.07753, 2022.
- J. Andreas. Language models as agent models. arXiv preprint arXiv:2212.01681, 2022.
- D. Schlangen. Dialogue games for benchmarking language understanding: Motivation, taxonomy, strategy, 2023.
- clembench: Using game play to evaluate chat-optimized language models as conversational agents, 2023.
- Generative agents: Interactive simulacra of human behavior, 2023.
- Camel: Communicative agents for ”mind” exploration of large scale language model society. ArXiv, abs/2303.17760, 2023.
- Training socially aligned language models in simulated human society, 2023.
- Ai safety via debate, 2018.
- Dera: Enhancing large language model completions with dialog-enabled resolving agents. arXiv preprint arXiv:2303.17071, 2023.
- Encouraging divergent thinking in large language models through multi-agent debate. ArXiv, abs/2305.19118, 2023.
- Improving factuality and reasoning in language models through multiagent debate, 2023.
- Y. Koga and J.-C. Latombe. On multi-arm manipulation planning. Proceedings of the 1994 IEEE International Conference on Robotics and Automation, pages 945–952 vol.2, 1994.
- S. Karaman and E. Frazzoli. Sampling-based algorithms for optimal motion planning, 2011.
- A. Dobson and K. E. Bekris. Planning representations and algorithms for prehensile multi-arm manipulation. In 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 6381–6386. IEEE, 2015.
- Learning a decentralized multi-arm motion planner. In Conference on Robotic Learning (CoRL), 2020.
- Coordinated multi-arm motion planning: Reaching for moving objects in the face of uncertainty. In Robotics: Science and Systems, 2016.
- Randomized path planning for linkages with closed kinematic chains. Robotics and Automation, IEEE Transactions on, 17:951 – 958, 01 2002. doi:10.1109/70.976030.
- Closed-chain manipulation of large objects by multi-arm robotic systems. IEEE Robotics and Automation Letters, 2(4):1832–1839, 2017.
- Multi-robot grasp planning for sequential assembly operations. In 2015 IEEE International Conference on Robotics and Automation (ICRA), pages 193–200, 2015. doi:10.1109/ICRA.2015.7138999.
- Mujoco: A physics engine for model-based control. In 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pages 5026–5033, 2012. doi:10.1109/IROS.2012.6386109.
- dm_control: Software and tasks for continuous control. Software Impacts, 6:100022, 2020.
- M. M. Contributors. MuJoCo Menagerie: A collection of high-quality simulation models for MuJoCo, 2022. URL http://github.com/deepmind/mujoco_menagerie.
- Learning dexterous manipulation from exemplar object trajectories and pre-grasps. In IEEE International Conference on Robotics and Automation 2023, 2023.