- The paper integrates LLMs with scene graph planning by translating natural language tasks into LTL formulas for optimized task execution.
- It employs a hierarchical planning domain using AMRA* and dual heuristics, significantly reducing computation time while ensuring optimality.
- Experimental results in complex semantic maps demonstrate enhanced real-world navigation capabilities for autonomous systems.
Optimal Scene Graph Planning with LLM Guidance
Introduction
The paper under discussion investigates the integration of LLMs with scene graph planning, particularly focusing on optimal path execution within semantic maps. The nexus of this work lies in utilizing the proficiency of LLMs to interpret natural language specified tasks and converting these directives into a functional, hierarchical planning domain facilitated by Linear Temporal Logic (LTL). This approach is poised to significantly enhance the capabilities of autonomous systems in executing complex semantic tasks in virtualized real-world environments.
Scene Graphs and Semantic Mapping
Modern advancements in robot perception and computer vision have enabled the development of sophisticated metric-semantic maps. These encompass hierarchical models that map environments in terms of semantic and topological relations, creating structures known as scene graphs. A scene graph effectively models various elements like buildings, rooms, and objects, encapsulating them in a unified hierarchical representation.
This research explores the conversion of natural language instructions into LTL formulas using LLMs—an approach that bridges high-level task planning and semantic map navigation.
Figure 1: Planning a natural language mission, μ:“Reach the oven in the kitchen”, in a scene graph G of the Gibson environment Benevolence \cite{xiazamirhe2018gibsonenv} with object, room, and floor attributes.
Methodology
Natural Language Translation to Temporal Logic
The methodology leverages LLMs to translate natural language tasks into LTL formulas over the scene graph's semantic elements. The process involves constructing a hierarchical attribute representation of the scene graph, which serves as input to the LLM. This input enables the LLM to generate syntactically correct and co-safe LTL formulas necessary for task planning. The transformation from natural language to LTL is guided by an attribute hierarchy that succinctly conveys the scene configuration.
Hierarchical Planning Domain
The core contribution of this work is the establishment of a hierarchical planning domain. This structure integrates scene attributes and automaton guidance to facilitate efficient task execution. The domain consists of multiple levels corresponding to different semantic attributes (e.g., rooms, objects, floors), allowing the navigation algorithm to operate at varying resolutions.
The paper employs the AMRA* algorithm, a multi-resolution multi-heuristic A* variant, to optimize path planning over this domain, ensuring that the planning process is both efficient and retains optimality guarantees through consistent heuristic functions.
Figure 2: Four-level hierarchical planning domain for Benevolence.
Heuristic Functions
Two distinct heuristic functions are central to this approach:
- LTL Heuristic: This function ensures consistency within the planning domain, facilitating optimal path discovery by bounding node costs using the automaton's structure.
- LLM Heuristic: This heuristic exploits the high-level insights provided by LLMs to guide path planning more effectively. It adjusts for semantic nuances and accelerates the planning process by utilizing LLM-generated guidance, which captures a comprehensive understanding of task semantics.
Experimental Evaluation
The paper demonstrates the efficiency of the proposed method through simulations in diverse environments, each with varying levels of complexity. The results underscore the advantages of integrating LLMs for semantic guidance—with significant reductions in computation time for both feasible and optimal path discoveries—illustrating the practicality and robustness of this methodology.





Figure 3: Path cost vs planning time for different AMRA
variants.*
Conclusion
This work showcases a novel method of leveraging LLMs for hierarchical task planning in scene graphs. By translating natural language tasks into LTL representations and employing a structured planning domain, the approach adeptly manages semantic complexities and optimizes path planning. Future work may focus on enhancing the heuristic strategies and expanding the attributes in the hierarchical planning framework, further broadening applicability in complex environments.