On Efficient Reinforcement Learning for Full-length Game of StarCraft II (2209.11553v1)

Published 23 Sep 2022 in cs.LG and cs.AI

Abstract: StarCraft II (SC2) poses a grand challenge for reinforcement learning (RL), of which the main difficulties include huge state space, varying action space, and a long time horizon. In this work, we investigate a set of RL techniques for the full-length game of StarCraft II. We investigate a hierarchical RL approach involving extracted macro-actions and a hierarchical architecture of neural networks. We investigate a curriculum transfer training procedure and train the agent on a single machine with 4 GPUs and 48 CPU threads. On a 64x64 map and using restrictive units, we achieve a win rate of 99% against the level-1 built-in AI. Through the curriculum transfer learning algorithm and a mixture of combat models, we achieve a 93% win rate against the most difficult non-cheating level built-in AI (level-7). In this extended version of the paper, we improve our architecture to train the agent against the cheating level AIs and achieve the win rate against the level-8, level-9, and level-10 AIs as 96%, 97%, and 94%, respectively. Our codes are at https://github.com/liuruoze/HierNet-SC2. To provide a baseline referring the AlphaStar for our work as well as the research and open-source community, we reproduce a scaled-down version of it, mini-AlphaStar (mAS). The latest version of mAS is 1.07, which can be trained on the raw action space which has 564 actions. It is designed to run training on a single common machine, by making the hyper-parameters adjustable. We then compare our work with mAS using the same resources and show that our method is more effective. The codes of mini-AlphaStar are at https://github.com/liuruoze/mini-AlphaStar. We hope our study could shed some light on the future research of efficient reinforcement learning on SC2 and other large-scale games.

Citations (12)

View on Semantic Scholar

Summary

The paper presents a hierarchical RL framework that uses a high-level controller and specialized sub-policies to efficiently manage large state and action spaces.
It reports impressive win rates, including 99% against level-1 AI and high success against more challenging opponents, showcasing improved sample efficiency and faster learning.
By outperforming mini-AlphaStar with limited resources, the study provides actionable benchmarks and open-source tools for advancing reinforcement learning research.

An Overview of Hierarchical Reinforcement Learning Techniques in StarCraft II

This paper explores reinforcement learning (RL) methodologies for mastering the full-length game of StarCraft II (SC2), presenting a hierarchical approach aimed at tackling the challenges of large state and action spaces, as well as long-term planning. StarCraft, known for its complexity and demand for strategic foresight, serves as an ideal testbed for RL innovations. The authors introduce a multi-layered architecture combined with curriculum transfer learning, reporting substantial improvements in performance against various difficulty levels of SC2's built-in AI.

Hierarchical and Modular Framework

The authors propose a two-level hierarchy: a high-level controller decides on sub-policy selection, while sub-policies, trained for specific tasks like combat or base management, operate at finer granularities. This modular design offers several advantages, including reducing the vast search space and facilitating scalable training. The use of macro-actions, extracted from human expert demonstrations using PrefixSpan, further compresses the action space significantly. This process alleviates decision-making burdens, demonstrating efficacy through improved sample efficiency and faster learning rates.

Experimental Results

The paper reports notable numerical results. Training the hierarchical agent on a 64x64 map yields a 99% win rate against the level-1 AI and 93% against level-7. The extensibility of the approach is validated by achieving win rates of 96%, 97%, and 94% against the challenging, cheating-level AIs (level-8, 9, and 10) by integrating a 3-layer hierarchy.

Comparison with Mini-AlphaStar

The paper positions its work against the mini-AlphaStar (mAS) framework, a reduced-resource replica of AlphaStar, highlighting its approach's efficiency. Using similar computational resources (4 GPUs and 48 CPU cores), the hierarchical method significantly outperforms mAS across multiple AI levels, although it's noted that direct comparisons are constrained by differing action spaces—macro actions vs. raw actions.

Implications and Future Directions

By demonstrating the capability of mastering a complex RTS game environment under constrained resources, this research underscores the potential of hierarchical reinforcement frameworks. The open-source provision of both their codes and mAS offers valuable benchmarks and tools for the RL community.

Conclusion

This work contributes to the ongoing discourse on efficient RL by advancing hierarchical architecture and macro-action abstractions in SC2. These techniques promise broader applications in multi-agent and strategic domains, paving the way for further explorations into intelligent and resource-efficient learning paradigms.

PDF Markdown