RePLan: Robotic Replanning with Perception and Language Models (2401.04157v2)
Abstract: Advancements in LLMs have demonstrated their potential in facilitating high-level reasoning, logical reasoning and robotics planning. Recently, LLMs have also been able to generate reward functions for low-level robot actions, effectively bridging the interface between high-level planning and low-level robot control. However, the challenge remains that even with syntactically correct plans, robots can still fail to achieve their intended goals due to imperfect plans or unexpected environmental issues. To overcome this, Vision LLMs (VLMs) have shown remarkable success in tasks such as visual question answering. Leveraging the capabilities of VLMs, we present a novel framework called Robotic Replanning with Perception and LLMs (RePLan) that enables online replanning capabilities for long-horizon tasks. This framework utilizes the physical grounding provided by a VLM's understanding of the world's state to adapt robot actions when the initial plan fails to achieve the desired goal. We developed a Reasoning and Control (RC) benchmark with eight long-horizon tasks to test our approach. We find that RePLan enables a robot to successfully adapt to unforeseen obstacles while accomplishing open-ended, long-horizon goals, where baseline models cannot, and can be readily applied to real robots. Find more information at https://replan-lm.github.io/replan.github.io/
- PDDL - the planning domain definition language. Tech. Rep., 1998.
- Qwen-vl: A frontier large vision-language model with versatile abilities. arXiv preprint arXiv:2308.12966, 2023.
- A heuristic search approach to planning with temporally extended preferences. Artif. Intell., 2009.
- A multitask, multilingual, multimodal evaluation of chatgpt on reasoning, hallucination, and interactivity. arXiv preprint arXiv:2302.04023, 2023.
- Rt-1: Robotics transformer for real-world control at scale. arXiv preprint arXiv:2212.06817, 2022.
- Rt-2: Vision-language-action models transfer web knowledge to robotic control. arXiv preprint arXiv:2307.15818, 2023a.
- Do as i can, not as i say: Grounding language in robotic affordances. In Conference on Robot Learning, pp. 287–318. PMLR, 2023b.
- Plausible may not be faithful: Probing object hallucination in vision-language pre-training. In Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics, pp. 2136–2148, Dubrovnik, Croatia, May 2023. Association for Computational Linguistics. doi: 10.18653/v1/2023.eacl-main.156. URL https://aclanthology.org/2023.eacl-main.156.
- Palm-e: An embodied multimodal language model. arXiv preprint arXiv:2303.03378, 2023.
- Model predictive control: Theory and practice—a survey. Automatica, 25(3):335–348, 1989.
- PDDLStream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In Proceedings of the 30th Int. Conf. on Automated Planning and Scheduling (ICAPS), pp. 440–448. AAAI Press, 2020.
- Integrated task and motion planning. Annual review of control, robotics, and autonomous systems, 4:265–293, 2021.
- An approach to temporal planning and scheduling in domains with predictable exogenous events. Journal of Artificial Intelligence Research, 25:187–231, 2006.
- Using natural language for reward shaping in reinforcement learning. arXiv preprint arXiv:1903.02020, 2019.
- Scaling up and distilling down: Language-guided robot skill acquisition. arXiv preprint arXiv:2307.14535, 2023.
- Furniturebench: Reproducible real-world benchmark for long-horizon complex manipulation. arXiv preprint arXiv:2305.12821, 2023.
- Predictive Sampling: Real-time Behaviour Synthesis with MuJoCo. dec 2022. doi: 10.48550/arXiv.2212.00541. URL https://arxiv.org/abs/2212.00541.
- Language models as zero-shot planners: Extracting actionable knowledge for embodied agents. arXiv preprint arXiv:2201.07207, 2022a.
- Inner monologue: Embodied reasoning through planning with language models. arXiv preprint arXiv:2207.05608, 2022b.
- Imitation learning: A survey of learning methods. ACM Computing Surveys (CSUR), 50(2):1–35, 2017.
- Learning to search in task and motion planning with streams. IEEE Robotics and Automation Letters, 8(4):1983–1990, 2023.
- Segment anything. arXiv:2304.02643, 2023.
- Reward design with language models. arXiv preprint arXiv:2303.00001, 2023.
- Behavior-1k: A benchmark for embodied ai with 1,000 everyday activities and realistic simulation. In Conference on Robot Learning, pp. 80–93. PMLR, 2023.
- Code as policies: Language model programs for embodied control. arXiv preprint, 2022. doi: 10.48550/arXiv.2209.07753.
- Inferring rewards from language in context. arXiv preprint arXiv:2204.02515, 2022.
- Text2motion: From natural language instructions to feasible plans. arXiv preprint arXiv:2303.12153, 2023.
- Improved baselines with visual instruction tuning, 2023.
- Mind’s eye: Grounded language model reasoning through simulation. arXiv preprint arXiv:2210.05359, 2022.
- Zero-shot reward specification via grounded natural language. In International Conference on Machine Learning, pp. 14743–14752. PMLR, 2022.
- Pddl-the planning domain definition language. 1998.
- A universal system for digitization and automatic execution of the chemical synthesis literature. Science, 370(6512):101–108, 2020.
- Chatmpc: Natural language based mpc personalization. arXiv preprint arXiv:2309.05952, 2023.
- Learning transferable visual models from natural language supervision. In International conference on machine learning, pp. 8748–8763. PMLR, 2021.
- Sayplan: Grounding large language models using 3d scene graphs for scalable task planning. In 7th Annual Conference on Robot Learning, 2023. URL https://openreview.net/forum?id=wMpOMO0Ss7a.
- James B Rawlings. Tutorial overview of model predictive control. IEEE control systems magazine, 20(3):38–52, 2000.
- Progprompt: Generating situated robot task plans using large language models. arXiv preprint arXiv:2209.11302, 2022.
- Errors are useful prompts: Instruction guided task programming with verifier-assisted iterative prompting. arXiv preprint arXiv:2303.14100, 2023.
- Open-world object manipulation using pre-trained vision-language models. arXiv preprint arXiv:2303.00905, 2023.
- Mujoco: A physics engine for model-based control. In 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033. IEEE, 2012. doi: 10.1109/IROS.2012.6386109.
- Voyager: An open-ended embodied agent with large language models. arXiv preprint arXiv: Arxiv-2305.16291, 2023a.
- A survey on large language model based autonomous agents. arXiv preprint arXiv:2308.11432, 2023b.
- Describe, explain, plan and select: Interactive planning with large language models enables open-world multi-task agents. arXiv preprint arXiv:2302.01560, 2023c.
- Chain-of-thought prompting elicits reasoning in large language models. Advances in Neural Information Processing Systems, 35:24824–24837, 2022.
- M-ember: Tackling long-horizon mobile manipulation via factorized domain transfer. arXiv preprint arXiv:2305.13567, 2023.
- The rise and potential of large language model based agents: A survey. arXiv preprint arXiv:2309.07864, 2023.
- Text2reward: Automated dense reward function generation for reinforcement learning. arXiv preprint arXiv:2309.11489, 2023.
- React: Synergizing reasoning and acting in language models. In The Eleventh International Conference on Learning Representations, 2023. URL https://openreview.net/forum?id=WE_vluYUL-X.
- Language to rewards for robotic skill synthesis. arXiv preprint arXiv:2306.08647, 2023.