Emergent Mind

Abstract

Multi-agent reinforcement learning (MARL) has achieved encouraging performance in solving complex multi-agent tasks. However, the safety of MARL policies is one critical concern that impedes their real-world applications. Furthermore, popular multi-agent benchmarks provide limited safety support for safe MARL research, where negative rewards for collisions are insufficient for guaranteeing the safety of MARL policies. Therefore, in this work, we propose a new safety-constrained multi-agent environment: MatrixWorld, based on the general pursuit-evasion game. In particular, a safety-constrained multi-agent action execution model is proposed for the software implementation of safe multi-agent environments. In addition, MatrixWorld is a lightweight co-evolution framework for the learning of pursuit tasks, evasion tasks, or both, where more pursuit-evasion variants are designed based on different practical meanings of safety. As a brief survey, we review and analyze the co-evolution mechanism in the multi-agent setting, which clearly reveals its relationships with autocurricula, self-play, arms races, and adversarial learning. Thus, we argue that MatrixWorld can serve as the first environment for autocurriculum research, where ideas can be quickly verified and well understood. Finally, based on the above problems concerning safe MARL and autocurricula, our experiments show the difficulties of general MARL in guaranteeing safe multi-agent coordination with only negative rewards for collisions and the potential of MatrixWorld in autocurriculum learning, where practical suggestions for successful multi-agent adversarial learning and arms races are given.

We're not able to analyze this paper right now due to high demand.

Please check back later (sorry!).

Generate a summary of this paper on our Pro plan:

We ran into a problem analyzing this paper.

Newsletter

Get summaries of trending comp sci papers delivered straight to your inbox:

Unsubscribe anytime.