Learning of Behavior Trees for Autonomous Agents (1504.05811v1)

Published 22 Apr 2015 in cs.RO, cs.AI, and cs.LG

Abstract: Definition of an accurate system model for Automated Planner (AP) is often impractical, especially for real-world problems. Conversely, off-the-shelf planners fail to scale up and are domain dependent. These drawbacks are inherited from conventional transition systems such as Finite State Machines (FSMs) that describes the action-plan execution generated by the AP. On the other hand, Behavior Trees (BTs) represent a valid alternative to FSMs presenting many advantages in terms of modularity, reactiveness, scalability and domain-independence. In this paper, we propose a model-free AP framework using Genetic Programming (GP) to derive an optimal BT for an autonomous agent to achieve a given goal in unknown (but fully observable) environments. We illustrate the proposed framework using experiments conducted with an open source benchmark Mario AI for automated generation of BTs that can play the game character Mario to complete a certain level at various levels of difficulty to include enemies and obstacles.

Citations (90)

View on Semantic Scholar

Summary

The paper introduces a novel framework using genetic programming to evolve behavior trees that enable autonomous agents to handle complex, unknown terrains.
The methodology combines greedy algorithms with genetic operators such as crossover and mutation to iteratively build scalable and reactive decision structures.
Experiments using the Mario AI benchmark demonstrate that the evolved behavior trees significantly enhance autonomous decision-making in challenging environments.

Overview of "Learning of Behavior Trees for Autonomous Agents"

The paper authored by Michele Colledanchise, Ramviyas Parasuraman, and Petter Ögren presents an investigation into the application of Genetic Programming (GP) for learning behavior trees (BTs) as a means of addressing the challenges of Automated Planning (AP) for autonomous agents operating in unknown environments. Acknowledging the limitations of traditional planners such as Finite State Machines (FSMs), the authors propose BTs for their modularity, reactiveness, scalability, and domain-independence. Using the Mario AI benchmark as a testbed, this paper aims to validate the capability of BTs integrated with GP to automate intelligent decision-making in multidimensional, fully observable environments.

Key Methodologies

The authors leverage a model-free framework to derive BTs that enable autonomous agents to achieve specified objectives without requiring predefined environmental models. The paper investigates the use of GP as part of a metaheuristic learning strategy, which combines greedy algorithms and GP-based optimization to evolve BTs. The process begins with simple action nodes and iteratively constructs BTs through genetic operators like crossover and mutation, guided by a fitness function that encapsulates the desired goal.

The BTs incorporate selector, sequence, parallel, and decorator nodes alongside condition and action nodes. By executing ticks that process these nodes, BTs enforce decision-making hierarchies analogous to function calls that allow for modular and comprehensible programming structures. The adaptive learning of BTs is further refined by anti-bloat control mechanisms to prune redundant nodes and optimize BT size.

Experimental Validation and Results

The authors implemented their methodology using the Mario AI benchmark, which replicates the Super Mario Bros game environment, including levels with varying complexities such as obstacles, enemies, and cliffs. Through a series of experiments, BTs were evolved that enabled Mario to successfully navigate to level endpoints while optimizing additional objectives like enemy elimination and coin collection. The fitness function guiding the GP evaluated parameters such as distance covered, adversaries overcome, and time utilized.

While the paper does not explicitly compare the performance of BTs against state-of-the-art methods, the results obtained through preliminary experiments demonstrate the algorithm's potential to generate effective BTs that increase both the fitness function value and the level of complexity agents can handle.

Implications and Future Work

The implications of this research are twofold. Practically, the findings advocate for the adoption of BTs in autonomous systems tasked with dynamic decision-making in complex environments, where preconstructed models are infeasible. Theoretically, the paper expands on the understanding of how evolutionary strategies can enhance the modular design of decision systems in AI, paving the way for broader applications across robotics and gaming industries.

Future work is anticipated to include comprehensive experimental sheets within Mario AI to quantitatively assess the proposed approach against other learning algorithms like Q-learning and Grammatical Evolution. Additional research could explore dynamic or partially observable environments, thus extending the applicability of the approach to real-world scenarios. Furthermore, investigating the integration of supervised learning paradigms could enrich the BT development process, enabling agents to learn from historical performance data or demonstration.

In conclusion, this paper contributes a novel approach to learning BTs for autonomous planning without prior domain information, illustrating promising advancements for adaptive intelligent agents across various operational contexts.

PDF Markdown

Related Papers

YouTube

Show All Videos