Emergent Mind

Emergent Coordination Through Competition

(1902.07151)
Published Feb 19, 2019 in cs.AI

Abstract

We study the emergence of cooperative behaviors in reinforcement learning agents by introducing a challenging competitive multi-agent soccer environment with continuous simulated physics. We demonstrate that decentralized, population-based training with co-play can lead to a progression in agents' behaviors: from random, to simple ball chasing, and finally showing evidence of cooperation. Our study highlights several of the challenges encountered in large scale multi-agent training in continuous control. In particular, we demonstrate that the automatic optimization of simple shaping rewards, not themselves conducive to co-operative behavior, can lead to long-horizon team behavior. We further apply an evaluation scheme, grounded by game theoretic principals, that can assess agent performance in the absence of pre-defined evaluation tasks or human baselines.

Overview

  • The study explores cooperative behavior emergence in agents using reinforcement learning within a competitive multi-agent soccer environment.

  • Agents learn to act in a complex space with realistic physics, provided by the MuJoCo engine, advancing from simple chasing to sophisticated cooperation.

  • Decentralized population-based training of independent learners allows for cooperative behavior to emerge without centralized control, by shaping rewards with the competitive results.

  • The evaluation of agent performance involves game theory applications, such as Nash averaging, to address the complexity of multi-agent interactions.

  • Agents demonstrated advanced team play and strategic behavior without explicit cooperation instructions, indicating potential applications beyond the digital soccer environment.

Methodology

The emergence of cooperative behavior among agents trained through reinforcement learning (RL) is an intriguing phenomenon, particularly observed in competitive environments like sports. A study by DeepMind explores this by introducing an intricate multi-agent soccer environment, rooted in continuous control and competition. Key to this research is the concept of reinforcement learning, where agents learn by trial and error, receiving rewards for successful actions. In their construct, agents advanced from random motion to simple ball chasing, eventually displaying sophisticated cooperation.

The environment employs MuJoCo, a physics engine that creates a consistent world with simulated physical laws. The agents have simplistic bodies but operate in a realistically complex space where multiple agents and more convoluted entities can be accommodated. Reinforcement learning in such an environment faces challenges, including coordination difficulties and the need for shaping rewards to guide agents' behavior.

Specifically, the researchers focused on decentralized population-based training (PBT) of independent RL learners. This approach diverges from centralized training, where shared information among agents is crucial for coordinated behavior. Instead, this study demonstrates that PBT can inspire cooperative behavior without such centralization. By integrating competitive match results into shaping rewards, the system allows for an organic evolution of agents' behaviors. Moreover, the study suggests decomposing rewards into separate channels with individual discount factors, which get optimized online, galvanizing agents toward long-range team objectives over time.

Emergent Behavior

The study provides more than just a method; it showcases the progression of agent behavior. Initially, agents purely chased the ball, driven by simple rewards. As they evolved within their digital microcosm, they began demonstrating tactics indicative of an understanding of the game's wider context, such as passing the ball and positioning themselves strategically.

The researchers employed evaluation schemes rooted in game theory to assess agent performance. Traditional metrics like a win-loss record against pre-designed bots or humans don't suffice, so they turned to Nash averaging. This approach accounts for non-transitive relationships between agents, where, for example, Agent A might beat Agent B, Agent B might beat Agent C, but Agent C might defeat Agent A, highlighting the complex dynamics that can emerge in multi-agent systems.

Results and Implications

The results are profound. Agents trained under these conditions showcased increasingly coordinated and strategic behavior without explicit instructions to do so. They were not only learning how to win but also how to play as a team, a remarkable advancement for AI.

The findings of this research have broader implications beyond a digital game of soccer. They contribute to understanding how complex cooperative strategies can form in social and economic systems, how teams can optimize performance, and how decentralized systems can evolve and adapt without central oversight.

Conclusion

The study elevates our understanding of AI, particularly how independent agents can develop cooperation in the midst of competition. This exploration into multi-agent reinforcement learning exhibits that through nuanced training environments and methodologies, complex, cooperative behaviors can spontaneously surface. The future may hold further explorations into larger agent populations, more complex scenarios, and the potential real-world applications of these AI "soccer players."

Create an account to read this summary for free:

Newsletter

Get summaries of trending comp sci papers delivered straight to your inbox:

Unsubscribe anytime.