When LLMs Play the Telephone Game: Cultural Attractors as Conceptual Tools to Evaluate LLMs in Multi-turn Settings (2407.04503v3)
Abstract: As LLMs start interacting with each other and generating an increasing amount of text online, it becomes crucial to better understand how information is transformed as it passes from one LLM to the next. While significant research has examined individual LLM behaviors, existing studies have largely overlooked the collective behaviors and information distortions arising from iterated LLM interactions. Small biases, negligible at the single output level, risk being amplified in iterated interactions, potentially leading the content to evolve towards attractor states. In a series of telephone game experiments, we apply a transmission chain design borrowed from the human cultural evolution literature: LLM agents iteratively receive, produce, and transmit texts from the previous to the next agent in the chain. By tracking the evolution of text toxicity, positivity, difficulty, and length across transmission chains, we uncover the existence of biases and attractors, and study their dependence on the initial text, the instructions, LLM, and model size. For instance, we find that more open-ended instructions lead to stronger attraction effects compared to more constrained tasks. We also find that different text properties display different sensitivity to attraction effects, with toxicity leading to stronger attractors than length. These findings highlight the importance of accounting for multi-step transmission dynamics and represent a first step towards a more comprehensive understanding of LLM cultural dynamics.
- A. Acerbi and J. M. Stubbersfield. Large language models show human-like content biases in transmission chain experiments. Proceedings of the National Academy of Sciences, 120(44):e2313790120, 2023.
- LitLLM: A Toolkit for Scientific Literature Review, Feb. 2024. arXiv:2402.01788 [cs].
- Analyzing the Impact of Data Selection and Fine-Tuning on Economic and Political Biases in LLMs, Apr. 2024. arXiv:2404.08699 [cs].
- C. Andersson and D. Read. Group size and cultural complexity. Nature, 511(7507):E1–E1, 2014. Publisher: Nature Publishing Group UK London.
- Which humans?, Sep 2023.
- R. Baldini. Revisiting the effect of population size on cumulative cultural evolution. Journal of Cognition and Culture, 15(3-4):320–336, 2015. Publisher: Brill.
- P. L. Bartlett. Remembering. Cambridge University Press., 1932.
- On the dangers of stochastic parrots: Can language models be too big? In Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency, FAccT ’21, page 610–623, New York, NY, USA, 2021. Association for Computing Machinery.
- J. Bogert. In defense of the fog index. The Bulletin of the Association for Business Communication, 48(2):9–12, 1985.
- Machine culture. Nature Human Behaviour, 7(11):1855–1868, 2023.
- Generative AI at Work, Apr. 2023.
- O. O. Buruk. Academic Writing with GPT-3.5: Reflections on Practices, Efficacy and Transparency. In 26th International Academic Mindtrek Conference, pages 144–153, Oct. 2023. arXiv:2304.11079 [cs].
- A. Buskell. What are cultural attractors? Biology & Philosophy, 32(3):377–394, 2017.
- Tortured phrases: A dubious writing style emerging in science. Evidence of critical issues affecting established journals, July 2021. arXiv:2107.06751 [cs].
- Humans or LLMs as the Judge? A Study on Judgement Biases, Apr. 2024. arXiv:2402.10669 [cs].
- Have AI-Generated Texts from LLM Infiltrated the Realm of Scientific Writing? A Large-Scale Analysis of Preprint Platforms, Mar. 2024. Pages: 2024.03.25.586710 Section: New Results.
- Simulating opinion dynamics with networks of llm-based agents. arXiv preprint arXiv:2311.09618, 2023.
- Artificial Generational Intelligence: Cultural Accumulation in Reinforcement Learning, June 2024. arXiv:2406.00392 [cs].
- Understanding hunter–gatherer cultural evolution needs network thinking. Trends in Ecology & Evolution, 37(8):632–636, 2022. Publisher: Elsevier.
- Emergent cooperation and strategy adaptation in multi-agent systems: An extended coevolutionary theory with llms. Electronics, 12(12):2722, 2023. Publisher: MDPI.
- Experimental evidence for the influence of group size on cultural complexity. Nature, 503(7476):389–391, 2013. Publisher: Nature Publishing Group UK London.
- M. Derex and R. Boyd. Partial connectivity increases cultural accumulation within groups. Proceedings of the National Academy of Sciences, 113(11):2982–2987, Mar. 2016.
- M. Derex and A. Mesoudi. Cumulative cultural evolution within evolving population structures. Trends in Cognitive Sciences, 24(8):654–667, 2020. Publisher: Elsevier.
- From human writing to artificial intelligence generated text: examining the prospects and potential threats of ChatGPT in academic writing. Biology of Sport, 40(2):615–622, Apr. 2023.
- Disclosure and Mitigation of Gender Bias in LLMs, Feb. 2024. arXiv:2402.11190 [cs].
- Cognitive Bias in High-Stakes Decision-Making with LLMs, Feb. 2024. arXiv:2403.00811 [cs].
- GPTs are GPTs: An Early Look at the Labor Market Impact Potential of Large Language Models, Aug. 2023. arXiv:2303.10130 [cs, econ, q-fin].
- Increasing population size can inhibit cumulative cultural evolution. Proceedings of the National Academy of Sciences, 116(14):6726–6731, Apr. 2019.
- J. Gleick and R. C. Hilborn. Chaos, Making a New Science. American Journal of Physics, 56(11):1053–1054, Nov. 1988.
- OpinionGPT: Modelling Explicit Biases in Instruction-Tuned LLMs, Sept. 2023. arXiv:2309.03876 [cs].
- L. Hanu and Unitary team. Detoxify. Github. https://github.com/unitaryai/detoxify, 2020.
- Natural language processing: python and NLTK. Packt Publishing Ltd, 2016.
- Tracking the perspectives of interacting language models, June 2024. arXiv:2406.11938 [cs].
- War and Peace (WarAgent): Large Language Model-based Multi-Agent Simulation of World Wars, Nov. 2023. arXiv:2311.17227 [cs].
- C. Hutto and E. Gilbert. Vader: A parsimonious rule-based model for sentiment analysis of social media text. In Proceedings of the international AAAI conference on web and social media, volume 8, pages 216–225, 2014.
- Iterated learning: Intergenerational knowledge transmission reveals inductive biases. Psychonomic Bulletin & Review, 14(2):288–294, 2007. Place: US Publisher: Psychonomic Society.
- Can large language models replace humans in systematic reviews? Evaluating GPT-4’s efficacy in screening and extracting data from peer-reviewed and grey literature in multiple languages. Research Synthesis Methods, Mar. 2024.
- S. Kirby and M. Tamariz. Cumulative cultural evolution, population structure and the origin of combinatoriality in human language. Philosophical Transactions of the Royal Society B: Biological Sciences, 377(1843):20200319, Jan. 2022.
- Gender bias and stereotypes in large language models. In Proceedings of The ACM Collective Intelligence Conference, pages 12–24, 2023.
- R. Marlow and D. Wood. Ghost in the machine or monkey with a typewriter—generating titles for Christmas research articles in The BMJ using artificial intelligence: observational study. The BMJ, 375:e067732, Dec. 2021.
- F. J. Massey. The kolmogorov-smirnov test for goodness of fit. Journal of the American Statistical Association, 46(253):68–78, 1951.
- A. Mesoudi. Experimental studies of cultural evolution, July 2021.
- M. Mitchell. Complexity: A Guided Tour. Oxford University Press, Oxford, New York, Apr. 2009.
- H. Miton. Cultural Attraction, Feb. 2024.
- Universal cognitive mechanisms explain the cultural success of bloodletting. Evolution and Human Behavior, 36(4):303–312, July 2015.
- Motor constraints influence cultural evolution of rhythm. Proceedings of the Royal Society B: Biological Sciences, 287(1937):20202001, Oct. 2020. Publisher: Royal Society.
- O. Morin. How Traditions Live and Die. Oxford University Press, 2016. Google-Books-ID: kSukCgAAQBAJ.
- More human than human: measuring ChatGPT political bias. Public Choice, Aug. 2023.
- Stereoset: Measuring stereotypical bias in pretrained language models. arXiv preprint arXiv:2004.09456, 2020.
- Social Network Structure Shapes Innovation: Experience-sharing in RL with SAPIENS, Nov. 2022. arXiv:2206.05060 [cs].
- Generative Agents: Interactive Simulacra of Human Behavior, Aug. 2023. arXiv:2304.03442 [cs].
- Social Simulacra: Creating Populated Prototypes for Social Computing Systems, Aug. 2022. arXiv:2208.04024 [cs].
- Cultural evolution in populations of Large Language Models, Mar. 2024. arXiv:2403.08882 [cs, q-bio].
- A. J. Peterson. Ai and the problem of knowledge collapse. arXiv preprint arXiv:2404.03502, 2024.
- AngleKindling: Supporting Journalistic Angle Ideation with Large Language Models. In Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems, CHI ’23, pages 1–16, New York, NY, USA, Apr. 2023. Association for Computing Machinery.
- Cultural reinforcement learning: a framework for modeling cumulative culture on a limited channel, May 2023.
- Breaking Bias, Building Bridges: Evaluation and Mitigation of Social Biases in LLMs via Contact Hypothesis, July 2024. arXiv:2407.02030 [cs].
- The Role of Social Network Structure in the Emergence of Linguistic Structure. Cognitive Science, 44(8):e12876, Aug. 2020.
- N. Reimers and I. Gurevych. Making monolingual sentence embeddings multilingual using knowledge distillation. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 11 2020.
- P. Richerson. Group size determines cultural complexity. Nature, 503(7476):351–352, 2013. Publisher: Nature Publishing Group UK London.
- The evolution and impact of large language model chatbots in social media: A comprehensive review of past, present, and future applications, 12 2023.
- In-Context Impersonation Reveals Large Language Models’ Strengths and Biases. Advances in Neural Information Processing Systems, 36, 2024.
- Whose opinions do language models reflect? In International Conference on Machine Learning, pages 29971–30004. PMLR, 2023.
- Kickstarting Deep Reinforcement Learning, Mar. 2018. arXiv:1803.03835 [cs].
- The curse of recursion: Training on generated data makes models forget. arXiv preprint arXiv:2305.17493, 2023.
- Exploring Value Biases: How LLMs Deviate Towards the Ideal, Feb. 2024. arXiv:2402.11005 [cs].
- D. Sperber. Anthropology and Psychology: Towards an Epidemiology of Representations. Man, 20(1):73–89, 1985. Publisher: [Wiley, Royal Anthropological Institute of Great Britain and Ireland].
- AI model GPT-3 (dis)informs us better than humans. Science Advances, 9(26):eadh1850, June 2023. Publisher: American Association for the Advancement of Science.
- Open-Ended Learning Leads to Generally Capable Agents, July 2021. arXiv:2107.12808 [cs].
- On the Automatic Generation and Simplification of Children’s Stories, Oct. 2023.
- Generative agent-based modeling with actions grounded in physical, social, or digital space using Concordia, Dec. 2023. arXiv:2312.03664 [cs].
- "Kelly is a Warm Person, Joseph is a Role Model": Gender Biases in LLM-Generated Reference Letters, Dec. 2023. arXiv:2310.09219 [cs].
- Ethical and social risks of harm from language models. arXiv preprint arXiv:2112.04359, 2021.
- D. Weiss. Generative AI is the Next Step in Democratizing Knowledge. https://techstrong.ai/articles/generative-ai-is-the-next-step-in-democratizing-knowledge/. [Accessed 22-05-2024].
- pymc-devs/pymc: v3.11.6, May 2024.
- Huggingface’s transformers: State-of-the-art natural language processing. arXiv preprint arXiv:1910.03771, 2019.
- Simulating Public Administration Crisis: A Novel Generative Agent-Based Simulation System to Lower Technology Barriers in Social Science Research, Nov. 2023. arXiv:2311.06957 [cs].
- The Next Chapter: A Study of Large Language Models in Storytelling, July 2023. arXiv:2301.09790 [cs].
- More human than human: LLM-generated narratives outperform human-LLM interleaved narratives. In Proceedings of the 15th Conference on Creativity and Cognition, pages 368–370, New York, NY, USA, June 2023. Association for Computing Machinery.