Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
166 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

DGPO: Discovering Multiple Strategies with Diversity-Guided Policy Optimization (2207.05631v3)

Published 12 Jul 2022 in cs.LG and cs.AI

Abstract: Most reinforcement learning algorithms seek a single optimal strategy that solves a given task. However, it can often be valuable to learn a diverse set of solutions, for instance, to make an agent's interaction with users more engaging, or improve the robustness of a policy to an unexpected perturbance. We propose Diversity-Guided Policy Optimization (DGPO), an on-policy algorithm that discovers multiple strategies for solving a given task. Unlike prior work, it achieves this with a shared policy network trained over a single run. Specifically, we design an intrinsic reward based on an information-theoretic diversity objective. Our final objective alternately constraints on the diversity of the strategies and on the extrinsic reward. We solve the constrained optimization problem by casting it as a probabilistic inference task and use policy iteration to maximize the derived lower bound. Experimental results show that our method efficiently discovers diverse strategies in a wide variety of reinforcement learning tasks. Compared to baseline methods, DGPO achieves comparable rewards, while discovering more diverse strategies, and often with better sample efficiency.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (42)
  1. The arcade learning environment: An evaluation platform for general agents. Journal of Artificial Intelligence Research, 47: 253–279.
  2. Dota 2 with large scale deep reinforcement learning. arXiv preprint arXiv:1912.06680.
  3. A Mixture-of-Expert Approach to RL-based Dialogue Management. arXiv preprint arXiv:2206.00059.
  4. Adaptable Agent Populations via a Generative Model of Policies. Advances in Neural Information Processing Systems, 34: 3902–3913.
  5. Diversity is all you need: Learning skills without a reward function. arXiv preprint arXiv:1802.06070.
  6. The Information Geometry of Unsupervised Reinforcement Learning. arXiv preprint arXiv:2110.02719.
  7. Variational methods for reinforcement learning. In Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, 241–248. JMLR Workshop and Conference Proceedings.
  8. Generating multiple diverse responses for short-text conversation. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 33, 6383–6390.
  9. Latent space policies for hierarchical reinforcement learning. In International Conference on Machine Learning, 1851–1860. PMLR.
  10. Soft actor-critic algorithms and applications. arXiv preprint arXiv:1812.05905.
  11. Learning an embedding space for transferable robot skills. In International Conference on Learning Representations.
  12. Skill Discovery of Coordination in Multi-agent Reinforcement Learning. arXiv preprint arXiv:2006.04021.
  13. TiKick: Towards Playing Multi-agent Football Full Games from Single-agent Demonstrations. arXiv preprint arXiv:2110.04507.
  14. Combo-action: Training agent for fps game with auxiliary tasks. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 33, 954–961.
  15. SVQN: Sequential Variational Soft Q-Learning Networks. In International Conference on Learning Representations.
  16. Deep variational reinforcement learning for POMDPs. In International Conference on Machine Learning, 2117–2126. PMLR.
  17. One solution is not all you need: Few-shot extrapolation via structured maxent rl. Advances in Neural Information Processing Systems, 33: 8198–8210.
  18. A unified game-theoretic approach to multiagent reinforcement learning. Advances in neural information processing systems, 30.
  19. Stochastic latent actor-critic: Deep reinforcement learning with a latent variable model. arXiv preprint arXiv:1907.00953.
  20. Learning to coordinate manipulation skills via skill behavior diversification. In International Conference on Learning Representations.
  21. Levine, S. 2018. Reinforcement learning and control as probabilistic inference: Tutorial and review. arXiv preprint arXiv:1805.00909.
  22. Deep reinforcement learning for dialogue generation. arXiv preprint arXiv:1606.01541.
  23. Maven: Multi-agent variational exploration. arXiv preprint arXiv:1910.07483.
  24. Variational information maximisation for intrinsically motivated reinforcement learning. arXiv preprint arXiv:1509.08731.
  25. Emergence of grounded compositional language in multi-agent populations. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 32.
  26. Overcoming the bootstrap problem in evolutionary robotics using behavioral diversity. In 2009 IEEE Congress on Evolutionary Computation, 1161–1168. IEEE.
  27. Discovering diverse solutions in deep reinforcement learning. arXiv preprint arXiv:2103.07084.
  28. Effective diversity in population based reinforcement learning. Advances in Neural Information Processing Systems, 33: 18050–18062.
  29. Increasing Diversity with Deep Reinforcement Learning for Chatbots. In RoCHI, 123–128.
  30. S-RL Toolbox: Environments, Datasets and Evaluation Metrics for State Representation Learning. arXiv preprint arXiv:1809.09369.
  31. Qmix: Monotonic value function factorisation for deep multi-agent reinforcement learning. In International Conference on Machine Learning, 4295–4304. PMLR.
  32. The starcraft multi-agent challenge. arXiv preprint arXiv:1902.04043.
  33. Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347.
  34. Reinforcement learning: An introduction. MIT press.
  35. Discovering Diverse Multi-Agent Strategic Behavior via Reward Randomization. arXiv preprint arXiv:2103.04564.
  36. Grandmaster level in StarCraft II using multi-agent reinforcement learning. Nature, 575(7782): 350–354.
  37. Diverse dialogue generation by fusing mutual persona-aware and self-transferrer. Applied Intelligence, 1–14.
  38. The Surprising Effectiveness of MAPPO in Cooperative, Multi-Agent Games. arXiv preprint arXiv:2103.01955.
  39. Learning Efficient Multi-Agent Cooperative Visual Exploration. arXiv preprint arXiv:2110.05734.
  40. Discovering diverse nearly optimal policies with successor features. In ICML 2021 Workshop on Unsupervised Reinforcement Learning.
  41. Continuously Discovering Novel Strategies via Reward-Switching Policy Optimization. arXiv preprint arXiv:2204.02246.
  42. Maximum entropy inverse reinforcement learning. In Aaai, volume 8, 1433–1438. Chicago, IL, USA.
Citations (5)

Summary

We haven't generated a summary for this paper yet.