Papers

Topics

Authors

Recent

View all

Gemini 2.5 Flash

97 tokens/sec

GPT-4o

53 tokens/sec

Gemini 2.5 Pro Pro

44 tokens/sec

o3 Pro

5 tokens/sec

GPT-4.1 Pro

47 tokens/sec

DeepSeek R1 via Azure Pro

28 tokens/sec

2000 character limit reached

271 1

Verifiably Following Complex Robot Instructions with Foundation Models (2402.11498v2)

Published 18 Feb 2024 in cs.RO and cs.AI

Abstract: Enabling mobile robots to follow complex natural language instructions is an important yet challenging problem. People want to flexibly express constraints, refer to arbitrary landmarks and verify behavior when instructing robots. Conversely, robots must disambiguate human instructions into specifications and ground instruction referents in the real world. We propose Language Instruction grounding for Motion Planning (LIMP), an approach that enables robots to verifiably follow expressive and complex open-ended instructions in real-world environments without prebuilt semantic maps. LIMP constructs a symbolic instruction representation that reveals the robot's alignment with an instructor's intended motives and affords the synthesis of robot behaviors that are correct-by-construction. We perform a large scale evaluation and demonstrate our approach on 150 instructions in five real-world environments showing the generality of our approach and the ease of deployment in novel unstructured domains. In our experiments, LIMP performs comparably with state-of-the-art LLM task planners and LLM code-writing planners on standard open vocabulary tasks and additionally achieves 79\% success rate on complex spatiotemporal instructions while LLM and Code-writing planners both achieve 38\%. See supplementary materials and demo videos at https://robotlimp.github.io

References (62)

Authors (4)

Benedict Quartey (8 papers)
Eric Rosen (20 papers)
Stefanie Tellex (45 papers)
George Konidaris (71 papers)

Citations (7)

View on Semantic Scholar

Summary

The paper introduces LIMP, a system that translates natural language instructions into enriched temporal logic via a two-stage prompting method.
It employs dynamic semantic mapping with visual language models to create Referent Semantic Maps for precise object localization.
The Progressive Motion Planner integrates finite-state automata with task and motion planning, achieving 90% navigation and 71% manipulation success rates.

Verifiably Following Complex Robot Instructions with Foundation Models

The paper "Verifiably Following Complex Robot Instructions with Foundation Models" introduces Language Instruction grounding for Motion Planning (LIMP), a system designed to enable robots to interpret and execute complex natural language instructions. This approach utilizes foundation models and temporal logics to accommodate instructions involving spatiotemporal constraints and open vocabulary referents.

Key Contributions

Instruction Translation into Temporal Logic: LIMP translates natural language instructions into temporal logic specifications using LLMs. This involves a two-stage prompting technique that initially maps instructions into traditional linear temporal logic (LTL) forms and then transforms them into a syntax enriched with Composible Referent Descriptors (CRDs). These CRDs encode descriptive spatial relationships, enabling nuanced referent disambiguation.
Dynamic Semantic Mapping: The system generates Referent Semantic Maps (RSM) to localize specific object instances based on resolved spatial relationships outlined in the translated instructions. This involves leveraging visual LLMs (VLMs) to detect object occurrences and apply spatial reasoning to refine these detections.
Task and Motion Planning (TAMP): The paper proposes a Progressive Motion Planner that employs finite-state automata to compile temporal logic into actionable tasks. This planner coordinates navigation and manipulation skills dynamically, restructuring the environment map into Task Progression Semantic Maps (TPSM) for real-time path planning. The approach guarantees correct-by-construction behavior through goal-directed and constraint-aware navigation.

Strong Numerical Results

The system was tested on 35 complex real-world instructions, yielding impressive results: a 90% success rate in object-goal navigation and 71% in mobile manipulation tasks. The two-stage prompting approach leveraging semantically similar in-context examples demonstrated superior performance across several metrics, including referent resolution accuracy and temporal alignment accuracy, outperforming single-stage and random example selection baseline methods.

Implications and Future Work

The theoretical and practical implications of LIMP are noteworthy. Practically, it provides a robust framework for robots to interpret and act upon human instructions in diverse, unstructured environments, without requiring pre-established semantic maps. Theoretically, it underscores the potential of interfacing foundation models with traditional planning frameworks, enhancing the explainability and alignment of robot behaviors.

Future work could address limitations such as non-reactivity in dynamic environments and extend capabilities to handle non-finite instruction sequences. Furthermore, refining the optimality of the planning process and exploring the integration of more complex manipulation strategies would continue to enhance the system’s robustness and applicability.

LIMP represents a progressive step in developing verifiable, reliable robotic systems capable of nuanced understanding and execution of human instructions in real-world scenarios. The paper effectively demonstrates the benefits of combining modern foundational models with classical planning methodologies, offering a promising avenue for advancements in robotic autonomy.

PDF Markdown

Tweets

https://twitter.com/Benedict_Q/status/1848769724198359150

https://twitter.com/KyleMorgenstein/status/1805267265626984478

YouTube

Show All Videos