Do LLM Agents Exhibit Social Behavior? (2312.15198v3)

Published 23 Dec 2023 in cs.AI, cs.SI, econ.GN, and q-fin.EC

Abstract: As LLMs increasingly take on roles in human-AI interactions and autonomous AI systems, understanding their social behavior becomes important for informed use and continuous improvement. However, their behaviors in social interactions with humans and other agents, as well as the mechanisms shaping their responses, remain underexplored. To address this gap, we introduce a novel probabilistic framework, State-Understanding-Value-Action (SUVA), to systematically analyze LLM responses in social contexts based on their textual outputs (i.e., utterances). Using canonical behavioral economics games and social preference concepts relatable to LLM users, SUVA assesses LLMs' social behavior through both their final decisions and the response generation processes leading to those decisions. Our analysis of eight LLMs -- including two GPT, four LLaMA, and two Mistral models -- suggests that most models do not generate decisions aligned solely with self-interest; instead, they often produce responses that reflect social welfare considerations and display patterns consistent with direct and indirect reciprocity. Additionally, higher-capacity models more frequently display group identity effects. The SUVA framework also provides explainable tools -- including tree-based visualizations and probabilistic dependency analysis -- to elucidate how factors in LLMs' utterance-based reasoning influence their decisions. We demonstrate that utterance-based reasoning reliably predicts LLMs' final actions; references to altruism, fairness, and cooperation in the reasoning increase the likelihood of prosocial actions, while mentions of self-interest and competition reduce them. Overall, our framework enables practitioners to assess LLMs for applications involving social interactions, and provides researchers with a structured method to interpret how LLM behavior arises from utterance-based reasoning.

Citations (20)

View on Semantic Scholar

Summary

The paper introduces a novel framework adapted from classical human experiments to assess social behavior in LLMs.
It employs economic modeling and regression analysis to evaluate social learning, preferences, and cooperation in GPT-4.
Findings reveal that GPT-4 exhibits human-like fairness and analytical social learning, though significant behavioral differences remain.

Overview of the Study

The paper inspects the capabilities of LLMs to simulate key social behaviors, a burgeoning area of interest in the field of artificial intelligence. The researchers developed a novel framework, drawing parallels from classical human behavior experiments, to scrutinize the level of social behavior manifested by LLMs.

Methodology

The paper elaborates on a unique experimental design, adapted from classical human social interaction studies, to evaluate LLM agents, with a particular focus on GPT-4. The model's behavior was analyzed across various social principles including social learning, preferences, and cooperation. Responses from GPT-4 were dissected using mechanisms such as economic modeling and regression analysis to comprehend the intrinsic characteristics driving LLM decisions.

LLM agents exhibit certain human-like social tendencies, as suggested by their distributional preferences and responsiveness to group identities, albeit with pronounced differences. For example, LLMs displayed significant fairness concern, showed weaker positive reciprocity compared to humans, and adopted a more analytical stance in social learning scenarios. These observations indicate that while LLMs can replicate aspects of human behavior, the nuances in their social interactions necessitate further exploration.

The paper concludes that LLMs like GPT-4 show promise for applications within social science research. They have the potential to simulate complex social interactions, offering valuable insights for fields such as agent-based modeling and policy evaluation. However, researchers should proceed with caution due to the subtle but significant deviations in LLM behavior from human subjects. The paper encourages further examination and careful application of LLMs to ensure accurate representation and utilization in social systems simulations.