Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
158 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

PerfectDou: Dominating DouDizhu with Perfect Information Distillation (2203.16406v7)

Published 30 Mar 2022 in cs.AI, cs.GT, and cs.LG

Abstract: As a challenging multi-player card game, DouDizhu has recently drawn much attention for analyzing competition and collaboration in imperfect-information games. In this paper, we propose PerfectDou, a state-of-the-art DouDizhu AI system that dominates the game, in an actor-critic framework with a proposed technique named perfect information distillation. In detail, we adopt a perfect-training-imperfect-execution framework that allows the agents to utilize the global information to guide the training of the policies as if it is a perfect information game and the trained policies can be used to play the imperfect information game during the actual gameplay. To this end, we characterize card and game features for DouDizhu to represent the perfect and imperfect information. To train our system, we adopt proximal policy optimization with generalized advantage estimation in a parallel training paradigm. In experiments we show how and why PerfectDou beats all existing AI programs, and achieves state-of-the-art performance.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (35)
  1. Dota 2 with large scale deep reinforcement learning. arXiv preprint arXiv:1912.06680, 2019.
  2. Deep counterfactual regret minimization. In International conference on machine learning, pages 793–802. PMLR, 2019.
  3. Superhuman ai for heads-up no-limit poker: Libratus beats top professionals. Science, 359(6374):418–424, 2018.
  4. Impala: Scalable distributed deep-rl with importance weighted actor-learner architectures. In International Conference on Machine Learning, pages 1407–1416. PMLR, 2018.
  5. Learning to communicate with deep multi-agent reinforcement learning. In Advances in Neural Information Processing Systems, pages 2137–2145, 2016.
  6. Universal trading for order execution with oracle policy distillation. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 35, pages 107–115, 2021.
  7. Actor-critic policy optimization in a large-scale imperfect-information game. In International Conference on Learning Representations, 2021.
  8. The advantage regret-matching actor-critic. arXiv preprint arXiv:2008.12234, 2020.
  9. Deep reinforcement learning from self-play in imperfect-information games. arXiv preprint arXiv:1603.01121, 2016.
  10. Deltadou: Expert-level doudizhu ai through self-play. In IJCAI, pages 1265–1271, 2019.
  11. Multi-agent actor-critic for mixed cooperative-competitive environments. In Advances in Neural Information Processing Systems, pages 6379–6390, 2017.
  12. A unified game-theoretic approach to multiagent reinforcement learning. Advances in neural information processing systems, 30, 2017.
  13. Improving policies via search in cooperative partially observable games. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 34, pages 7187–7194, 2020.
  14. A survey of nash equilibrium strategy solving based on cfr. Archives of Computational Methods in Engineering, 28(4):2749–2760, 2021.
  15. Suphx: Mastering mahjong with deep reinforcement learning. arXiv preprint arXiv:2003.13590, 2020.
  16. Automating collusion detection in sequential games. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 27, pages 675–682, 2013.
  17. Deepstack: Expert-level artificial intelligence in heads-up no-limit poker. Science, 356(6337):508–513, 2017.
  18. High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:1506.02438, 2015.
  19. Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347, 2017.
  20. Horovod: fast and easy distributed deep learning in tensorflow. arXiv preprint arXiv:1802.05799, 2018.
  21. Mastering chess and shogi by self-play with a general reinforcement learning algorithm. arXiv preprint arXiv:1712.01815, 2017.
  22. Mastering the game of go without human knowledge. nature, 550(7676):354–359, 2017.
  23. Actor-critic policy optimization in partially observable multiagent environments. Advances in neural information processing systems, 31, 2018.
  24. Dream: Deep regret minimization with advantage baselines and model-free learning. arXiv preprint arXiv:2006.10410, 2020.
  25. Reinforcement learning: An introduction. MIT press, 2018.
  26. Grandmaster level in starcraft ii using multi-agent reinforcement learning. Nature, 575(7782):350–354, 2019.
  27. Solving games with functional regret estimation. In Twenty-ninth AAAI conference on artificial intelligence, 2015.
  28. Mastering complex control in moba games with deep reinforcement learning. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 34, pages 6672–6679, 2020.
  29. Combinational q-learning for dou di zhu. arXiv preprint arXiv:1901.08925, 2019.
  30. Rlcard: A toolkit for reinforcement learning in card games. arXiv preprint arXiv:1910.04376, 2019.
  31. Douzero: Mastering doudizhu with self-play deep reinforcement learning. In Marina Meila and Tong Zhang, editors, Proceedings of the 38th International Conference on Machine Learning, volume 139, pages 12333–12344. PMLR, 2021.
  32. Combining tree search and action prediction for state-of-the-art performance in doudizhu. In IJCAI, pages 3413–3419, 2021.
  33. Alphaholdem: High-performance artificial intelligence for heads-up no-limit texas hold’em from end-to-end reinforcement learning. 2022.
  34. Douzero+: Improving doudizhu ai by opponent modeling and coach-guided learning. arXiv preprint arXiv:2204.02558, 2022.
  35. Regret minimization in games with incomplete information. Advances in neural information processing systems, 20:1729–1736, 2007.
Citations (24)

Summary

We haven't generated a summary for this paper yet.