Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 134 tok/s
Gemini 2.5 Pro 41 tok/s Pro
GPT-5 Medium 33 tok/s Pro
GPT-5 High 30 tok/s Pro
GPT-4o 86 tok/s Pro
Kimi K2 173 tok/s Pro
GPT OSS 120B 438 tok/s Pro
Claude Sonnet 4.5 37 tok/s Pro
2000 character limit reached

Room Clearance with Feudal Hierarchical Reinforcement Learning (2105.11328v1)

Published 24 May 2021 in cs.LG and cs.AI

Abstract: Reinforcement learning (RL) is a general framework that allows systems to learn autonomously through trial-and-error interaction with their environment. In recent years combining RL with expressive, high-capacity neural network models has led to impressive performance in a diverse range of domains. However, dealing with the large state and action spaces often required for problems in the real world still remains a significant challenge. In this paper we introduce a new simulation environment, "Gambit", designed as a tool to build scenarios that can drive RL research in a direction useful for military analysis. Using this environment we focus on an abstracted and simplified room clearance scenario, where a team of blue agents have to make their way through a building and ensure that all rooms are cleared of (and remain clear) of enemy red agents. We implement a multi-agent version of feudal hierarchical RL that introduces a command hierarchy where a commander at the higher level sends orders to multiple agents at the lower level who simply have to learn to follow these orders. We find that breaking the task down in this way allows us to solve a number of non-trivial floorplans that require the coordination of multiple agents much more efficiently than the standard baseline RL algorithms we compare with. We then go on to explore how qualitatively different behaviour can emerge depending on what we prioritise in the agent's reward function (e.g. clearing the building quickly vs. prioritising rescuing civilians).

Summary

We haven't generated a summary for this paper yet.

Dice Question Streamline Icon: https://streamlinehq.com

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Lightbulb Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.