Simulating Human Strategic Behavior: Comparing Single and Multi-agent LLMs (2402.08189v2)

Published 13 Feb 2024 in cs.HC

Abstract: When creating policies, plans, or designs for people, it is challenging for designers to foresee all of the ways in which people may reason and behave. Recently, LLMs have been shown to be able to simulate human reasoning. We extend this work by measuring LLMs ability to simulate strategic reasoning in the ultimatum game, a classic economics bargaining experiment. Experimental evidence shows human strategic reasoning is complex; people will often choose to punish other players to enforce social norms even at personal expense. We test if LLMs can replicate this behavior in simulation, comparing two structures: single LLMs and multi-agent systems. We compare their abilities to (1) simulate human-like reasoning in the ultimatum game, (2) simulate two player personalities, greedy and fair, and (3) create robust strategies that are logically complete and consistent with personality. Our evaluation shows that multi-agent systems are more accurate than single LLMs (88 percent vs. 50 percent) in simulating human reasoning and actions for personality pairs. Thus, there is potential to use LLMs to simulate human strategic reasoning to help decision and policy-makers perform preliminary explorations of how people behave in systems.

References (22)

Authors (2)

Karthik Sreedhar (5 papers)
Lydia Chilton (12 papers)

Citations (18)

View on Semantic Scholar

Summary

The paper demonstrates that multi-agent LLMs accurately simulate human strategic decisions in the ultimatum game, achieving 88% accuracy.
The paper compares single-agent and multi-agent architectures, highlighting multi-agent models’ superior strategic comprehensiveness.
The paper shows that multi-agent LLMs effectively model distinct player personalities, paving the way for advanced AI-driven behavior simulations.

Simulating Human Strategic Behavior: An Evaluation of Single and Multi-agent LLMs

This essay provides an analysis of the research presented in the paper titled "Simulating Human Strategic Behavior: Comparing Single and Multi-agent LLMs" by Karthik Sreedhar and Lydia Chilton. The paper investigates the capability of LLMs to simulate human-like strategic behavior, particularly in the context of the ultimatum game. Two LLM architectures are compared: single-agent and multi-agent frameworks. The paper evaluates their performance in modeling human behavior, especially focusing on strategic and personality-consistent actions.

The ultimatum game serves as the experimental framework. This classic economics game offers valuable insights into human strategic interactions and deviation from purely profit-maximizing strategies. Human subjects typically engage in altruistic punishment—often declining small, but nonzero, amounts—in order to enforce fairness. This complex understanding and resultant behavior provide a challenging scenario for LLM simulations. The paper assesses LLM performance in simulating this game through three primary investigative lenses: the ability to simulate human-like actions, accurately model distinct player personalities (greedy vs. fair), and create robust, consistent strategic plans.

Key Findings and Methodology

Simulation Infrastructure and Evaluation:

Single vs. Multi-agent Architectures:
- Single-agent involves GPT-4 simulating the entire game by handling both players.
- Multi-agent architecture represents each player as a distinct GPT-4 instance, allowing for interaction dynamics more akin to independent agents.
Main Results:
- The multi-agent architecture achieved high accuracy (88%) in emulating human strategies and behavioral adherence to distinct personalities, substantially outperforming the single LLM setup (50% accuracy).
- The majority of errors in the single-agent simulations were attributed to incomplete strategic plans, reinforcing the superiority of the multi-agent approach in strategic comprehensiveness.
Gameplay Accuracy and Personality Modeling:
- Multi-agent LLMs demonstrated effective modeling for both personality archetypes across various pairings.
- Errors primarily stemmed from strategy inconsistencies rather than gameplay deviations, suggesting gaps in pre-simulation strategic formulation rather than dynamic interactions.
Methodological Approach and Parameters:
- Simulations were conducted across 40 different sessions for each condition.
- GPT-4 models were responsible for reasoned outputs by incorporating personality-driven strategies reflective of human behavior.
- The ultimatum game was iterated over five rounds, emphasizing longitudinal interaction dynamics.

Implications and Potential for AI

The findings suggest a significant potential application of multi-agent LLMs in simulating strategic human behaviors. Such simulations could benefit fields like policy-making, economics, human-computer interaction design, and strategic planning initiatives. By modeling various personality-driven behavioral strategies realistically, LLM-based simulations can enhance the predictive accuracy of how individuals respond in strategically competitive environments.

The high performance of multi-agent frameworks in this context posits vast future potential for leveraging AI to replicate complex, multi-faceted human cognitive behaviors. However, it is critical to acknowledge the constraints of the current paper, including the confines of a controlled experimental game scenario and potential limitations in real-world application veracity. The ability of LLMs to scale this behavioral fidelity to more intricate, high-stakes strategic contexts remains an open avenue for exploration.

Conclusions and Future Directions

The paper significantly advances the understanding of LLM capabilities in simulating nuanced human-like behaviors. Multi-agent architectures have exhibited proficiency in internalizing strategic interactions which are consistent with human experimental baselines. This work paves the way for further investigation into advanced interaction dynamics, extending beyond simple economic games to more intricate, real-life scenarios. Key areas for future research include exploring adaptive strategy development, incorporating environmental context variability, and handling broader agent-based simulations. As such, this foundational work augurs well for the emerging landscape of AI-driven behavior simulation in varied socio-economic and policy-driven settings.

PDF Markdown

Related Papers

Tweets

https://twitter.com/jameokayy/status/1877257749093871939

YouTube

Show All Videos