DGPO: Discovering Multiple Strategies with Diversity-Guided Policy Optimization (2207.05631v3)
Abstract: Most reinforcement learning algorithms seek a single optimal strategy that solves a given task. However, it can often be valuable to learn a diverse set of solutions, for instance, to make an agent's interaction with users more engaging, or improve the robustness of a policy to an unexpected perturbance. We propose Diversity-Guided Policy Optimization (DGPO), an on-policy algorithm that discovers multiple strategies for solving a given task. Unlike prior work, it achieves this with a shared policy network trained over a single run. Specifically, we design an intrinsic reward based on an information-theoretic diversity objective. Our final objective alternately constraints on the diversity of the strategies and on the extrinsic reward. We solve the constrained optimization problem by casting it as a probabilistic inference task and use policy iteration to maximize the derived lower bound. Experimental results show that our method efficiently discovers diverse strategies in a wide variety of reinforcement learning tasks. Compared to baseline methods, DGPO achieves comparable rewards, while discovering more diverse strategies, and often with better sample efficiency.
- The arcade learning environment: An evaluation platform for general agents. Journal of Artificial Intelligence Research, 47: 253–279.
- Dota 2 with large scale deep reinforcement learning. arXiv preprint arXiv:1912.06680.
- A Mixture-of-Expert Approach to RL-based Dialogue Management. arXiv preprint arXiv:2206.00059.
- Adaptable Agent Populations via a Generative Model of Policies. Advances in Neural Information Processing Systems, 34: 3902–3913.
- Diversity is all you need: Learning skills without a reward function. arXiv preprint arXiv:1802.06070.
- The Information Geometry of Unsupervised Reinforcement Learning. arXiv preprint arXiv:2110.02719.
- Variational methods for reinforcement learning. In Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, 241–248. JMLR Workshop and Conference Proceedings.
- Generating multiple diverse responses for short-text conversation. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 33, 6383–6390.
- Latent space policies for hierarchical reinforcement learning. In International Conference on Machine Learning, 1851–1860. PMLR.
- Soft actor-critic algorithms and applications. arXiv preprint arXiv:1812.05905.
- Learning an embedding space for transferable robot skills. In International Conference on Learning Representations.
- Skill Discovery of Coordination in Multi-agent Reinforcement Learning. arXiv preprint arXiv:2006.04021.
- TiKick: Towards Playing Multi-agent Football Full Games from Single-agent Demonstrations. arXiv preprint arXiv:2110.04507.
- Combo-action: Training agent for fps game with auxiliary tasks. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 33, 954–961.
- SVQN: Sequential Variational Soft Q-Learning Networks. In International Conference on Learning Representations.
- Deep variational reinforcement learning for POMDPs. In International Conference on Machine Learning, 2117–2126. PMLR.
- One solution is not all you need: Few-shot extrapolation via structured maxent rl. Advances in Neural Information Processing Systems, 33: 8198–8210.
- A unified game-theoretic approach to multiagent reinforcement learning. Advances in neural information processing systems, 30.
- Stochastic latent actor-critic: Deep reinforcement learning with a latent variable model. arXiv preprint arXiv:1907.00953.
- Learning to coordinate manipulation skills via skill behavior diversification. In International Conference on Learning Representations.
- Levine, S. 2018. Reinforcement learning and control as probabilistic inference: Tutorial and review. arXiv preprint arXiv:1805.00909.
- Deep reinforcement learning for dialogue generation. arXiv preprint arXiv:1606.01541.
- Maven: Multi-agent variational exploration. arXiv preprint arXiv:1910.07483.
- Variational information maximisation for intrinsically motivated reinforcement learning. arXiv preprint arXiv:1509.08731.
- Emergence of grounded compositional language in multi-agent populations. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 32.
- Overcoming the bootstrap problem in evolutionary robotics using behavioral diversity. In 2009 IEEE Congress on Evolutionary Computation, 1161–1168. IEEE.
- Discovering diverse solutions in deep reinforcement learning. arXiv preprint arXiv:2103.07084.
- Effective diversity in population based reinforcement learning. Advances in Neural Information Processing Systems, 33: 18050–18062.
- Increasing Diversity with Deep Reinforcement Learning for Chatbots. In RoCHI, 123–128.
- S-RL Toolbox: Environments, Datasets and Evaluation Metrics for State Representation Learning. arXiv preprint arXiv:1809.09369.
- Qmix: Monotonic value function factorisation for deep multi-agent reinforcement learning. In International Conference on Machine Learning, 4295–4304. PMLR.
- The starcraft multi-agent challenge. arXiv preprint arXiv:1902.04043.
- Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347.
- Reinforcement learning: An introduction. MIT press.
- Discovering Diverse Multi-Agent Strategic Behavior via Reward Randomization. arXiv preprint arXiv:2103.04564.
- Grandmaster level in StarCraft II using multi-agent reinforcement learning. Nature, 575(7782): 350–354.
- Diverse dialogue generation by fusing mutual persona-aware and self-transferrer. Applied Intelligence, 1–14.
- The Surprising Effectiveness of MAPPO in Cooperative, Multi-Agent Games. arXiv preprint arXiv:2103.01955.
- Learning Efficient Multi-Agent Cooperative Visual Exploration. arXiv preprint arXiv:2110.05734.
- Discovering diverse nearly optimal policies with successor features. In ICML 2021 Workshop on Unsupervised Reinforcement Learning.
- Continuously Discovering Novel Strategies via Reward-Switching Policy Optimization. arXiv preprint arXiv:2204.02246.
- Maximum entropy inverse reinforcement learning. In Aaai, volume 8, 1433–1438. Chicago, IL, USA.