Emergent Coordination Through Competition (1902.07151v2)

Published 19 Feb 2019 in cs.AI

Abstract: We study the emergence of cooperative behaviors in reinforcement learning agents by introducing a challenging competitive multi-agent soccer environment with continuous simulated physics. We demonstrate that decentralized, population-based training with co-play can lead to a progression in agents' behaviors: from random, to simple ball chasing, and finally showing evidence of cooperation. Our study highlights several of the challenges encountered in large scale multi-agent training in continuous control. In particular, we demonstrate that the automatic optimization of simple shaping rewards, not themselves conducive to co-operative behavior, can lead to long-horizon team behavior. We further apply an evaluation scheme, grounded by game theoretic principals, that can assess agent performance in the absence of pre-defined evaluation tasks or human baselines.

Authors (6)

Siqi Liu (94 papers)
Guy Lever (18 papers)
Josh Merel (31 papers)
Saran Tunyasuvunakool (19 papers)
Nicolas Heess (139 papers)
Thore Graepel (48 papers)

Citations (145)

View on Semantic Scholar

Summary

The paper demonstrates that decentralized population-based training enables RL agents to evolve coordinated behaviors in a competitive soccer simulation.
Methodologically, agents advanced from random movements to strategic ball-passing through intrinsic rewards in a realistic MuJoCo environment.
Key evaluations using Nash averaging reveal that agents naturally develop teamwork, offering insights into complex adaptive systems.

Methodology

The emergence of cooperative behavior among agents trained through reinforcement learning (RL) is an intriguing phenomenon, particularly observed in competitive environments like sports. A paper by DeepMind explores this by introducing an intricate multi-agent soccer environment, rooted in continuous control and competition. Key to this research is the concept of reinforcement learning, where agents learn by trial and error, receiving rewards for successful actions. In their construct, agents advanced from random motion to simple ball chasing, eventually displaying sophisticated cooperation.

The environment employs MuJoCo, a physics engine that creates a consistent world with simulated physical laws. The agents have simplistic bodies but operate in a realistically complex space where multiple agents and more convoluted entities can be accommodated. Reinforcement learning in such an environment faces challenges, including coordination difficulties and the need for shaping rewards to guide agents' behavior.

Specifically, the researchers focused on decentralized population-based training (PBT) of independent RL learners. This approach diverges from centralized training, where shared information among agents is crucial for coordinated behavior. Instead, this paper demonstrates that PBT can inspire cooperative behavior without such centralization. By integrating competitive match results into shaping rewards, the system allows for an organic evolution of agents' behaviors. Moreover, the paper suggests decomposing rewards into separate channels with individual discount factors, which get optimized online, galvanizing agents toward long-range team objectives over time.

Emergent Behavior

The paper provides more than just a method; it showcases the progression of agent behavior. Initially, agents purely chased the ball, driven by simple rewards. As they evolved within their digital microcosm, they began demonstrating tactics indicative of an understanding of the game's wider context, such as passing the ball and positioning themselves strategically.

The researchers employed evaluation schemes rooted in game theory to assess agent performance. Traditional metrics like a win-loss record against pre-designed bots or humans don't suffice, so they turned to Nash averaging. This approach accounts for non-transitive relationships between agents, where, for example, Agent A might beat Agent B, Agent B might beat Agent C, but Agent C might defeat Agent A, highlighting the complex dynamics that can emerge in multi-agent systems.

Results and Implications

The results are profound. Agents trained under these conditions showcased increasingly coordinated and strategic behavior without explicit instructions to do so. They were not only learning how to win but also how to play as a team, a remarkable advancement for AI.

The findings of this research have broader implications beyond a digital game of soccer. They contribute to understanding how complex cooperative strategies can form in social and economic systems, how teams can optimize performance, and how decentralized systems can evolve and adapt without central oversight.

Conclusion

The paper elevates our understanding of AI, particularly how independent agents can develop cooperation in the midst of competition. This exploration into multi-agent reinforcement learning exhibits that through nuanced training environments and methodologies, complex, cooperative behaviors can spontaneously surface. The future may hold further explorations into larger agent populations, more complex scenarios, and the potential real-world applications of these AI "soccer players."

PDF Markdown

Related Papers

YouTube

Show All Videos