Is this the real life? Is this just fantasy? The Misleading Success of Simulating Social Interactions With LLMs (2403.05020v4)
Abstract: Recent advances in LLMs (LLM) have enabled richer social simulations, allowing for the study of various social phenomena. However, most recent work has used a more omniscient perspective on these simulations (e.g., single LLM to generate all interlocutors), which is fundamentally at odds with the non-omniscient, information asymmetric interactions that involve humans and AI agents in the real world. To examine these differences, we develop an evaluation framework to simulate social interactions with LLMs in various settings (omniscient, non-omniscient). Our experiments show that LLMs perform better in unrealistic, omniscient simulation settings but struggle in ones that more accurately reflect real-world conditions with information asymmetry. Our findings indicate that addressing information asymmetry remains a fundamental challenge for LLM-based agents.
- Indirectness as a path to common ground management.
- J L Austin. 1975. How to do things with words: Second edition, 2 edition. The William James Lectures. Harvard University Press, London, England.
- Karen Bartsch and Henry M. Wellman. 1995. Children Talk About the Mind. Oxford University Press.
- On the dangers of stochastic parrots: Can language models be too big? In Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency, FAccT ’21, page 610–623, New York, NY, USA. Association for Computing Machinery.
- Loopholes: A window into value alignment and the communication of meaning.
- Fausto Carcassi and Michael Franke. 2023. How to handle the truth: A model of politeness as strategic truth-stretching. Proceedings of the Annual Meeting of the Cognitive Science Society, 45(45).
- Places: Prompting language models for social conversation synthesis. In Findings.
- PLACES: Prompting language models for social conversation synthesis. In Findings of EACL 2023.
- Herbert H Clark. 1996. Using Language. Cambridge University Press.
- Under the surface: Tracking the artifactuality of llm-generated data. arXiv preprint arXiv:2401.14698.
- Daniel C Dennett. 1978. Beliefs about beliefs. Behav. Brain Sci., 1(4):568–570.
- Anthropomorphization of ai: Opportunities and risks.
- M Franke. 2009. Signal to act: Game theory in pragmatics. Ph.D. thesis, Universiteit van Amsterdam, Amsterdam.
- How Efficiency Shapes Human Language. Trends in cognitive sciences, 23(5):389–407.
- Nigel Gilbert. 2005. Simulation for the Social Scientist, 2 edition. Open University Press.
- Noah D Goodman and Michael C Frank. 2016. Pragmatic Language Interpretation as Probabilistic Inference. Trends in cognitive sciences, 20(11):818–829.
- The division of labor in communication: Speakers help listeners account for asymmetries in visual perspective. Cognitive science, 45(3):e12926.
- Learning symmetric collaborative dialogue agents with dynamic knowledge graph embeddings. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 1766–1776, Vancouver, Canada. Association for Computational Linguistics.
- Decoupling strategy and generation in negotiation dialogues. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 2333–2343, Brussels, Belgium. Association for Computational Linguistics.
- An overview of catastrophic ai risks.
- Zero-shot goal-directed dialogue via rl on imagined conversations. ArXiv, abs/2311.05584.
- Resolving indirect referring expressions for entity selection. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 12313–12335, Stroudsburg, PA, USA. Association for Computational Linguistics.
- A fine-grained comparison of pragmatic language understanding in humans and language models. arXiv [cs.CL].
- A review of urban residential choice models using Agent-Based modeling. Environment and planning. B, Planning & design, 41(4):661–689.
- Mixtral of experts.
- Charles Kemp and Terry Regier. 2012. Kinship categories across languages reflect general communicative principles. Science (New York, N.Y.), 336(6084):1049–1054.
- SODA: Million-scale dialogue distillation with social commonsense contextualization. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 12930–12949, Singapore. Association for Computational Linguistics.
- FANToM: A benchmark for stress-testing machine theory of mind in interactions. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 14397–14413, Singapore. Association for Computational Linguistics.
- Stephen C. Levinson. 2016. Turn-taking in human communication – origins and implications for language processing. Trends in Cognitive Sciences, 20(1):6–14.
- Camel: Communicative agents for" mind" exploration of large language model society. In Thirty-seventh Conference on Neural Information Processing Systems.
- Metaagents: Simulating interactions of human behaviors for llm-based task-oriented coordination via collaborative generative agents.
- Encouraging divergent thinking in large language models through multi-agent debate. ArXiv, abs/2305.19118.
- Inferring rewards from language in context.
- Evaluating statistical language models as pragmatic reasoners. arXiv [cs.CL].
- Reproducibility in nlp: What have we learned from the checklist? In Annual Meeting of the Association for Computational Linguistics.
- Model cards for model reporting. In Proceedings of the Conference on Fairness, Accountability, and Transparency, FAT* ’19, page 220–229, New York, NY, USA. Association for Computing Machinery.
- Designing and detecting lies by reasoning about other agents. Journal of experimental psychology. General, 152(2):346–362.
- Training language models to follow instructions with human feedback.
- Self-alignment of large language models via monopolylogue-based social scene simulation.
- Generative agents: Interactive simulacra of human behavior. In In the 36th Annual ACM Symposium on User Interface Software and Technology (UIST ’23), UIST ’23, New York, NY, USA. Association for Computing Machinery.
- Social simulacra: Creating populated prototypes for social computing systems. In In the 35th Annual ACM Symposium on User Interface Software and Technology (UIST ’22), UIST ’22, New York, NY, USA. Association for Computing Machinery.
- NOPE: A corpus of naturally-occurring presuppositions in english. In Proceedings of the 25th Conference on Computational Natural Language Learning, pages 349–366, Stroudsburg, PA, USA. Association for Computational Linguistics.
- The logic of indirect speech. Proceedings of the National Academy of Sciences of the United States of America, 105(3):833–838.
- David Premack and Guy Woodruff. 1978. Does the chimpanzee have a theory of mind? The Behavioral and brain sciences, 1(4):515–526.
- Modeling punishment as a rational communicative social action. Proceedings of the Annual Meeting of the Cognitive Science Society, 44(44).
- The goldilocks of pragmatic understanding: Fine-tuning strategy matters for implicature resolution by LLMs. In Thirty-seventh Conference on Neural Information Processing Systems.
- Verbosity bias in preference labeling by large language models. In NeurIPS 2023 Workshop on Instruction Tuning and Instruction Following.
- R. Keith Sawyer. 2005. Social Emergence: Societies As Complex Systems. Cambridge University Press.
- Green ai.
- Murray Shanahan. 2023. Talking about large language models.
- Towards understanding sycophancy in language models.
- Am I me or you? state-of-the-art dialogue models cannot maintain an identity. In Findings of the Association for Computational Linguistics: NAACL 2022, pages 2367–2387, Seattle, United States. Association for Computational Linguistics.
- Can you put it all together: Evaluating conversational agents’ ability to blend skills. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 2021–2030, Online. Association for Computational Linguistics.
- Robert Stalnaker. 2014. Context. Oxford University Press.
- Cognitive Architectures for Language Agents.
- Reconciling truthfulness and relevance as epistemic and decision-theoretic utility. Psychological review.
- Leigh Tesfatsion and Kenneth L Judd. 2006. Handbook of Computational Economics: Agent-Based Computational Economics. Elsevier.
- Do llms exhibit human-like response biases? a case study in survey design. arXiv preprint arXiv:2311.04076.
- Michael Tomasello. 1999. The Cultural Origins of Human Cognition. Harvard University Press.
- Michael Tomasello. 2021. Becoming Human: A Theory of Ontogeny. Belknap Press.
- Bootstrapping llm-based task-oriented dialogue agents via self-talk. ArXiv, abs/2401.05033.
- Humanoid agents: Platform for simulating human-like generative agents. In EMNLP System Demonstrations.
- Max Weber. 1978. The Nature of Social Action, page 7–32. Cambridge University Press.
- From word models to world models: Translating from natural language to the probabilistic language of thought.
- Inferring the goals of communicating agents from actions and instructions. arXiv [cs.AI].
- Polite Speech Emerges From Competing Social Goals. Open mind : discoveries in cognitive science, 4(4):71–87.
- Efficient compression in color naming and its evolution. Proceedings of the National Academy of Sciences of the United States of America, 115(31):7937–7942.
- Challenges in automated debiasing for toxic language detection. In EACL.
- Sotopia: Interactive evaluation for social intelligence in language agents. In ICLR.
- Can Large Language Models Transform Computational Social Science? Computational Linguistics.
- Xuhui Zhou (33 papers)
- Zhe Su (33 papers)
- Tiwalayo Eisape (4 papers)
- Hyunwoo Kim (52 papers)
- Maarten Sap (86 papers)