Gym-$μ$RTS: Toward Affordable Full Game Real-time Strategy Games Research with Deep Reinforcement Learning (2105.13807v3)

Published 21 May 2021 in cs.LG

Abstract: In recent years, researchers have achieved great success in applying Deep Reinforcement Learning (DRL) algorithms to Real-time Strategy (RTS) games, creating strong autonomous agents that could defeat professional players in StarCraft~II. However, existing approaches to tackle full games have high computational costs, usually requiring the use of thousands of GPUs and CPUs for weeks. This paper has two main contributions to address this issue: 1) We introduce Gym-$\mu$RTS (pronounced "gym-micro-RTS") as a fast-to-run RL environment for full-game RTS research and 2) we present a collection of techniques to scale DRL to play full-game $\mu$RTS as well as ablation studies to demonstrate their empirical importance. Our best-trained bot can defeat every $\mu$RTS bot we tested from the past $\mu$RTS competitions when working in a single-map setting, resulting in a state-of-the-art DRL agent while only taking about 60 hours of training using a single machine (one GPU, three vCPU, 16GB RAM). See the blog post at https://wandb.ai/vwxyzjn/gym-microrts-paper/reports/Gym-RTS-Toward-Affordable-Deep-Reinforcement-Learning-Research-in-Real-Time-Strategy-Games--Vmlldzo2MDIzMTg and the source code at https://github.com/vwxyzjn/gym-microrts-paper

Authors (4)

Shengyi Huang (16 papers)
Chris Bamford (7 papers)
Lukasz Grela (1 paper)
Santiago Ontañón (28 papers)

Citations (31)

View on Semantic Scholar

Summary

An In-Depth Analysis of Gym- $\mu$ RTS and Its Role in Reinforcement Learning for RTS Games

The research paper titled "Gym- $\mu$ RTS: Toward Affordable Full Game Real-time Strategy Games Research with Deep Reinforcement Learning" presents notable advancements in the field of Deep Reinforcement Learning (DRL) applied to Real-time Strategy (RTS) games. The work specifically focuses on reducing the computational complexity involved in training competitive AI agents for full RTS games while proposing a new framework, Gym- $\mu$ RTS, for this purpose.

Computational Challenges in RTS Game Research

The application of DRL to RTS games has achieved significant milestones, exemplified by DeepMind's AlphaStar, which reached Grandmaster level performance in StarCraft II. However, these achievements necessitate massive computational resources, typically involving thousands of GPUs and CPUs over prolonged periods. This poses a substantial barrier to entry for researchers without access to such resources, thereby restricting broader exploration and development in this domain.

Contributions of Gym- $\mu$ RTS

This paper seeks to address these challenges with two main contributions:

Introduction of Gym- $\mu$ RTS: An efficient, lightweight RL environment designed for full-game RTS research that encapsulates all fundamental game aspects (e.g., resource harvesting, defense, and attack) without the prohibitive computational burden observed in more complex games like StarCraft II.
DRL Techniques and Ablation Studies: The paper also offers a series of DRL techniques specifically tailored for controlling agents in $\mu$ RTS, along with ablation studies to elucidate the significance of these techniques. The authors demonstrate that their DRL bot, when trained on a single machine, surpasses past competition bots in a $\mu$ RTS setting with considerable efficiency.

Methodological Innovations

The investigation leverages Proximal Policy Optimization (PPO), a robust policy gradient method, enhanced through several key innovations:

Action Composition and Invalid Action Masking: These techniques reduce the combinatorial complexity of the action space in $\mu$ RTS, allowing for more efficient DRL execution. By masking invalid actions and composing actions, the system can focus learning efforts on viable strategies.
Neural Network Architectures for Policy Representation: The researchers explored various architectures, including Nature-CNN, Impala-CNN, and encoder-decoder networks, to adapt to the game's demands effectively.
Training with Diversified Opponents: By training against a variety of bots, the DRL agent learns to generalize across different strategies, crucial for its effective functioning in competitive scenarios.

Results and Implications

The experiments confirm that Gym- $\mu$ RTS is a viable platform for RTS game research, facilitating the training of competitive agents on limited hardware within a reasonable timeframe. This democratizes access to cutting-edge AI research in complex environments, expanding opportunities for innovation and development in smaller academic settings and industry labs with constrained resources.

Moreover, the findings highlight the potential of action composition and invalid action masking in managing vast action spaces. These could be extended to more complex settings, potentially informing the development of more resource-friendly strategies in existing frameworks like PySC2.

Future Prospects

The paper opens new avenues for further exploration:

Generalization to Diverse Maps and Partial Observability: Potential extensions include adapting agents to perform across varied environments and implementing real-time decision-making under uncertain conditions (e.g., fog-of-war scenarios).
Advanced Self-play and Learning without Human-designed Bots: While the current paper demonstrates substantial efficacy, employing advanced self-play techniques might further enhance the DRL agent's strategic capabilities while aiming to reduce reliance on predefined bot strategies.

In conclusion, the Gym- $\mu$ RTS framework represents a significant step forward in making game AI research more accessible and affordable, serving as a pivotal tool for academics and industry professionals aiming to explore and innovate within the field of DRL and RTS games.