"Other-Play" for Zero-Shot Coordination (2003.02979v3)

Published 6 Mar 2020 in cs.AI

Abstract: We consider the problem of zero-shot coordination - constructing AI agents that can coordinate with novel partners they have not seen before (e.g. humans). Standard Multi-Agent Reinforcement Learning (MARL) methods typically focus on the self-play (SP) setting where agents construct strategies by playing the game with themselves repeatedly. Unfortunately, applying SP naively to the zero-shot coordination problem can produce agents that establish highly specialized conventions that do not carry over to novel partners they have not been trained with. We introduce a novel learning algorithm called other-play (OP), that enhances self-play by looking for more robust strategies, exploiting the presence of known symmetries in the underlying problem. We characterize OP theoretically as well as experimentally. We study the cooperative card game Hanabi and show that OP agents achieve higher scores when paired with independently trained agents. In preliminary results we also show that our OP agents obtains higher average scores when paired with human players, compared to state-of-the-art SP agents.

Citations (191)

View on Semantic Scholar

Summary

The paper introduces the Other-Play algorithm that leverages symmetry principles to enhance zero-shot coordination in multi-agent reinforcement learning.
It demonstrates that OP-trained agents outperform self-play agents in cooperative tasks, notably achieving higher scores in Hanabi through effective cross-play.
Empirical results and theoretical analysis confirm OP's potential for real-world applications requiring adaptable human-AI collaboration.

"Other-Play" for Zero-Shot Coordination: An Academic Overview

The paper "Other-Play for Zero-Shot Coordination" by Hengyuan Hu, Adam Lerer, Alex Peysakhovich, and Jakob Foerster addresses the substantial challenge of constructing AI agents capable of effectively coordinating with novel partners they have not interacted with before. This paper situates its research within the domain of Multi-Agent Reinforcement Learning (MARL), extending previous methodologies to address limitations in agents trained solely by self-play (SP).

Problem and Motivation

In standard MARL environments, self-play is a commonly employed method where agents optimize their strategies through repeated interaction with themselves. Although effective in finding equilibrium strategies in zero-sum games, SP falls short in cooperative frameworks when connections with unfamiliar partners are necessary. This insufficiency arises because SP agents often learn highly specialized conventions. Without preset conventions, agents struggle in zero-shot coordination settings where quick adaptability is vital.

Key Contributions

Introduction of the Other-Play (OP) Algorithm: The paper introduces OP, an algorithm modification to self-play that leverages known symmetries within the environment. The goal of OP is to foster the development of robust strategies by encouraging agents to consider the variety of symmetry-equivalent strategies their partners might employ.
Theoretical Underpinnings: The authors theoretically characterize OP, demonstrating that it aligns with maximizing expected returns across symmetry-equivalent policies. OP is shown to function within the concept of meta-equilibria, ensuring the optimality of strategies when matched with OP-trained agents.
Empirical Validation: Through empirical evaluation in the cooperative card game Hanabi, OP agents showcased enhanced performance when paired with independently trained agents, outperforming SP agents. Moreover, OP agents demonstrated promising coordination abilities in preliminary tests with human participants.

Numerical Results and Claims

The paper reports experimental results from a tabular environment, the lever coordination game, and a more complex, partially observable game, Hanabi. In both settings, OP agents successfully avoided the pitfalls of coordination failures exhibited by SP agents. Specifically, in Hanabi, OP agents achieved notable cross-play scores, indicating improved capability to coordinate with human players. The OP-trained agents had an average score of 15.75 (s.e.m. 1.23) with humans versus SP agents' average of 9.15 (s.e.m. 1.18).

Implications and Future Work

Practically, the OP framework could be highly impactful in domains requiring seamless human-AI collaboration, such as autonomous driving, where agents are continually faced with novel situations. Theoretically, OP adds an innovative perspective to cooperative learning by explicitly considering symmetries, a strategy that has not been fully explored in previous MARL research.

Future developments might involve extending OP's underlying principles to environments where the symmetry sets are not predetermined but need to be discovered algorithmically. Enhancing OP's application in dynamic, real-world environments could substantially progress the practicality of AI coordinating effectively without prior interaction with partner agents.

Conclusion

The research provides a compelling contribution to the MARL field by addressing the zero-shot coordination challenge through the Other-Play methodology. By innovatively leveraging environmental symmetries, this work paves the way for AI systems that can more naturally and effectively work with diverse, unfamiliar partners. The findings from this paper encourage a reconsideration of coordination mechanisms in multi-agent environments and offer a robust foundation for addressing similar challenges in future research.

PDF Markdown

Related Papers

YouTube

Show All Videos