Optimal Scene Graph Planning with Large Language Model Guidance (2309.09182v2)

Published 17 Sep 2023 in cs.RO

Abstract: Recent advances in metric, semantic, and topological mapping have equipped autonomous robots with semantic concept grounding capabilities to interpret natural language tasks. This work aims to leverage these new capabilities with an efficient task planning algorithm for hierarchical metric-semantic models. We consider a scene graph representation of the environment and utilize a LLM to convert a natural language task into a linear temporal logic (LTL) automaton. Our main contribution is to enable optimal hierarchical LTL planning with LLM guidance over scene graphs. To achieve efficiency, we construct a hierarchical planning domain that captures the attributes and connectivity of the scene graph and the task automaton, and provide semantic guidance via an LLM heuristic function. To guarantee optimality, we design an LTL heuristic function that is provably consistent and supplements the potentially inadmissible LLM guidance in multi-heuristic planning. We demonstrate efficient planning of complex natural language tasks in scene graphs of virtualized real environments.

Citations (9)

View on Semantic Scholar

Summary

The paper integrates LLMs with scene graph planning by translating natural language tasks into LTL formulas for optimized task execution.
It employs a hierarchical planning domain using AMRA* and dual heuristics, significantly reducing computation time while ensuring optimality.
Experimental results in complex semantic maps demonstrate enhanced real-world navigation capabilities for autonomous systems.

Optimal Scene Graph Planning with LLM Guidance

Introduction

The paper under discussion investigates the integration of LLMs with scene graph planning, particularly focusing on optimal path execution within semantic maps. The nexus of this work lies in utilizing the proficiency of LLMs to interpret natural language specified tasks and converting these directives into a functional, hierarchical planning domain facilitated by Linear Temporal Logic (LTL). This approach is poised to significantly enhance the capabilities of autonomous systems in executing complex semantic tasks in virtualized real-world environments.

Scene Graphs and Semantic Mapping

Modern advancements in robot perception and computer vision have enabled the development of sophisticated metric-semantic maps. These encompass hierarchical models that map environments in terms of semantic and topological relations, creating structures known as scene graphs. A scene graph effectively models various elements like buildings, rooms, and objects, encapsulating them in a unified hierarchical representation.

This research explores the conversion of natural language instructions into LTL formulas using LLMs—an approach that bridges high-level task planning and semantic map navigation.

Figure 1: Planning a natural language mission, $\mu: \text{``Reach the oven in the kitchen''}$ , in a scene graph $G$ of the Gibson environment Benevolence \cite{xiazamirhe2018gibsonenv} with object, room, and floor attributes.

Methodology

Natural Language Translation to Temporal Logic

The methodology leverages LLMs to translate natural language tasks into LTL formulas over the scene graph's semantic elements. The process involves constructing a hierarchical attribute representation of the scene graph, which serves as input to the LLM. This input enables the LLM to generate syntactically correct and co-safe LTL formulas necessary for task planning. The transformation from natural language to LTL is guided by an attribute hierarchy that succinctly conveys the scene configuration.

Hierarchical Planning Domain

The core contribution of this work is the establishment of a hierarchical planning domain. This structure integrates scene attributes and automaton guidance to facilitate efficient task execution. The domain consists of multiple levels corresponding to different semantic attributes (e.g., rooms, objects, floors), allowing the navigation algorithm to operate at varying resolutions.

The paper employs the AMRA* algorithm, a multi-resolution multi-heuristic A* variant, to optimize path planning over this domain, ensuring that the planning process is both efficient and retains optimality guarantees through consistent heuristic functions.

Figure 2: Four-level hierarchical planning domain for Benevolence.

Heuristic Functions

Two distinct heuristic functions are central to this approach:

LTL Heuristic: This function ensures consistency within the planning domain, facilitating optimal path discovery by bounding node costs using the automaton's structure.
LLM Heuristic: This heuristic exploits the high-level insights provided by LLMs to guide path planning more effectively. It adjusts for semantic nuances and accelerates the planning process by utilizing LLM-generated guidance, which captures a comprehensive understanding of task semantics.

Experimental Evaluation

The paper demonstrates the efficiency of the proposed method through simulations in diverse environments, each with varying levels of complexity. The results underscore the advantages of integrating LLMs for semantic guidance—with significant reductions in computation time for both feasible and optimal path discoveries—illustrating the practicality and robustness of this methodology.

Figure 3: Path cost vs planning time for different AMRA

variants.*

Conclusion

This work showcases a novel method of leveraging LLMs for hierarchical task planning in scene graphs. By translating natural language tasks into LTL representations and employing a structured planning domain, the approach adeptly manages semantic complexities and optimizes path planning. Future work may focus on enhancing the heuristic strategies and expanding the attributes in the hierarchical planning framework, further broadening applicability in complex environments.