Is this the real life? Is this just fantasy? The Misleading Success of Simulating Social Interactions With LLMs

Published 8 Mar 2024 in cs.CL and cs.AI | (2403.05020v4)

Abstract: Recent advances in LLMs (LLM) have enabled richer social simulations, allowing for the study of various social phenomena. However, most recent work has used a more omniscient perspective on these simulations (e.g., single LLM to generate all interlocutors), which is fundamentally at odds with the non-omniscient, information asymmetric interactions that involve humans and AI agents in the real world. To examine these differences, we develop an evaluation framework to simulate social interactions with LLMs in various settings (omniscient, non-omniscient). Our experiments show that LLMs perform better in unrealistic, omniscient simulation settings but struggle in ones that more accurately reflect real-world conditions with information asymmetry. Our findings indicate that addressing information asymmetry remains a fundamental challenge for LLM-based agents.

Abstract PDF HTML Upgrade to Chat

References (70)

Citations (21)

View on Semantic Scholar

Summary

The paper demonstrates that Script mode’s omniscience inflates social goal success compared to more realistic Agents mode.
The paper introduces a structured evaluation framework that differentiates between information-rich and information-asymmetric simulation setups.
The paper finds that finetuning on Script data improves dialogue naturalness but does not significantly enhance goal completion accuracy in cooperative tasks.

The paper "Is this the real life? Is this just fantasy? The Misleading Success of Simulating Social Interactions With LLMs" investigates the efficacy of using LLMs to simulate human social interactions. The authors identify a fundamental misalignment between how LLMs are used to simulate these interactions and the inherent non-omniscient, information asymmetric nature of human communications.

The authors develop a structured evaluation framework that distinguishes between two modes of simulation: Script mode, where a single LLM has omniscient access to all participants' information and goals, and Agents mode, where multiple LLMs independently simulate distinct agents without access to each other's internal states. Through experiments, the authors discern that the Script mode leads to an overestimation of social goal achievement and interaction naturalness when compared to the more realistic Agents mode.

Quantitative findings underscore a significant disparity in performance: agents in Script mode displayed enhanced success in achieving social objectives, with higher completion rates and fluid dialogue. On the other hand, Agents mode, which better emulates human-like information processing due to its information asymmetry features, resulted in less natural and poorer goal-oriented interactions. Interestingly, alternative approaches, such as allowing agents to have access to others' mental states (referred to as Mindreaders mode), also demonstrate superior performance over true human-like asymmetry scenarios, indicating the crucial role of information sharing in enhancing interaction outcomes.

The paper ventures further to explore whether training LLMs using data from Script simulations could yield improvements in real-world interaction simulations. Finetuning LLMs on Script data improved dialogue naturalness but did not enhance the accuracy of goal completion significantly in cooperative scenarios where precise understanding and inference of interlocutor's unknown states are vital. The authors attribute this limited improvement to the inherent biases found in Script simulations, where omniscient setups tend to produce overly agreeable or unnatural decision-making strategies due to their unrestricted access to internal states.

The authors recommend careful reporting and a delineated understanding of simulation modes in related research, advocating for a transparent approach while recognizing the limitations laid out in their findings. They propose "simulation cards" in analogy to model cards, to offer a detailed index of simulation procedures, facilitating better discourse on the application and evaluation of LLM-based agents in simulating social interactions.

In addressing future developments, the paper calls for more human-like modeling approaches, moving beyond simple omniscience and embracing techniques that simulate human strategic reasoning in the face of information asymmetry. Such modeling might involve more explicit scaffolding of LLM responses based on inferred beliefs and shared knowledge within dialogues.

This research presents a cautionary perspective on the oversimplification involved in LLM-based social simulations and urges the field to recognize its current limitations, aiming for better alignment with human cognitive and social processes. The paper ultimately highlights the enduring challenge of bridging machine-like perception and human-like interaction complexity, driving towards more nuanced and practical applications in social AI.

Markdown