Simulating Human Strategic Behavior: Comparing Single and Multi-agent LLMs (2402.08189v2)
Abstract: When creating policies, plans, or designs for people, it is challenging for designers to foresee all of the ways in which people may reason and behave. Recently, LLMs have been shown to be able to simulate human reasoning. We extend this work by measuring LLMs ability to simulate strategic reasoning in the ultimatum game, a classic economics bargaining experiment. Experimental evidence shows human strategic reasoning is complex; people will often choose to punish other players to enforce social norms even at personal expense. We test if LLMs can replicate this behavior in simulation, comparing two structures: single LLMs and multi-agent systems. We compare their abilities to (1) simulate human-like reasoning in the ultimatum game, (2) simulate two player personalities, greedy and fair, and (3) create robust strategies that are logically complete and consistent with personality. Our evaluation shows that multi-agent systems are more accurate than single LLMs (88 percent vs. 50 percent) in simulating human reasoning and actions for personality pairs. Thus, there is potential to use LLMs to simulate human strategic reasoning to help decision and policy-makers perform preliminary explorations of how people behave in systems.
- Using Large Language Models to Simulate Multiple Humans and Replicate Human Subject Studies. arXiv:2208.10264 [cs.CL]
- Michael Alvard. 2004. The Ultimatum Game, Fairness, and Cooperation among Big Game Hunters. 413–435. https://doi.org/10.1093/0199262055.003.0014
- Dan Ariely. 2008. Predictably Irrational: The Hidden Forces That Shape Our Decisions. Harper, New York, NY.
- Upside Down Dialectics: Exploring design conversations with synthetic humans. In under review at DESRIST 2024.
- Sparks: Inspiration for Science Writing using Language Models. In Proceedings of the 2022 ACM Designing Interactive Systems Conference (¡conf-loc¿, ¡city¿Virtual Event¡/city¿, ¡country¿Australia¡/country¿, ¡/conf-loc¿) (DIS ’22). Association for Computing Machinery, New York, NY, USA, 1002–1019. https://doi.org/10.1145/3532106.3533533
- Fulin Guo. 2023. GPT in Game Theory Experiments. arXiv:2305.05516 [econ.GN]
- Sil Hamilton. 2023. Blind Judgement: Agent-Based Supreme Court Modelling With GPT. arXiv:2301.05327 [cs.CL]
- Joseph Henrich. 2000. Does Culture Matter in Economic Behavior? Ultimatum Game Bargaining among the Machiguenga of the Peruvian Amazon. American Economic Review 90, 4 (September 2000), 973–979. https://doi.org/10.1257/aer.90.4.973
- TrueSkill™: A Bayesian Skill Rating System. In Advances in Neural Information Processing Systems, B. Schölkopf, J. Platt, and T. Hoffman (Eds.), Vol. 19. MIT Press. https://proceedings.neurips.cc/paper_files/paper/2006/file/f44ee263952e65b3610b8ba51229d1f9-Paper.pdf
- John J. Horton. 2023. Large Language Models as Simulated Economic Agents: What Can We Learn from Homo Silicus? arXiv:2301.07543 [econ.GN]
- Daniel Houser and Kevin McCabe. 2014. Chapter 2 - Experimental Economics and Experimental Game Theory. In Neuroeconomics (Second Edition) (second edition ed.), Paul W. Glimcher and Ernst Fehr (Eds.). Academic Press, San Diego, 19–34. https://doi.org/10.1016/B978-0-12-416008-8.00002-4
- Daniel Kahneman. 2012. Thinking, fast and slow. Penguin, London.
- Rational snacking: Young children’s decision-making on the marshmallow task is moderated by beliefs about environmental reliability. Cognition 126, 1 (2013), 109–114. https://doi.org/10.1016/j.cognition.2012.08.004
- Daniel C. Krawczyk. 2018. Chapter 12 - Social Cognition: Reasoning With Others. In Reasoning, Daniel C. Krawczyk (Ed.). Academic Press, 283–311. https://doi.org/10.1016/B978-0-12-809285-9.00012-0
- Manfred Königstein. 2001. Personality influences on Ultimatum Game bargaining decisions. European Journal of Personality 15 (10 2001), S53 – S70. https://doi.org/10.1002/per.424
- Robert R McCrae and Paul T Jr Costa. 2008. The five-factor theory of personality. In Handbook of personality: Theory and research (3 ed.), Oliver P John, Richard W Robins, and Lawrence A Pervin (Eds.). The Guilford Press, 159–181.
- Sendhil Mullainathan and Eldar Shafir. 2013. Scarcity: Why having too little means so much. Times Books/Henry Holt and Co.
- GPT-4 Technical Report. arXiv:2303.08774 [cs.CL]
- Generative Agents: Interactive Simulacra of Human Behavior. arXiv:2304.03442 [cs.HC]
- Expectations in the Ultimatum Game: Distinct Effects of Mean and Variance of Expected Offers. Frontiers in Psychology 9 (2018). https://doi.org/10.3389/fpsyg.2018.00992
- Chain of Thought Prompting Elicits Reasoning in Large Language Models. CoRR abs/2201.11903 (2022). arXiv:2201.11903 https://arxiv.org/abs/2201.11903
- Progressive-Hint Prompting Improves Reasoning in Large Language Models. arXiv:2304.09797 [cs.CL]
- Karthik Sreedhar (5 papers)
- Lydia Chilton (12 papers)