Cooperation, Competition, and Maliciousness: LLM-Stakeholders Interactive Negotiation (2309.17234v2)
Abstract: There is an growing interest in using LLMs in multi-agent systems to tackle interactive real-world tasks that require effective collaboration and assessing complex situations. Yet, we still have a limited understanding of LLMs' communication and decision-making abilities in multi-agent setups. The fundamental task of negotiation spans many key features of communication, such as cooperation, competition, and manipulation potentials. Thus, we propose using scorable negotiation to evaluate LLMs. We create a testbed of complex multi-agent, multi-issue, and semantically rich negotiation games. To reach an agreement, agents must have strong arithmetic, inference, exploration, and planning capabilities while integrating them in a dynamic and multi-turn setup. We propose multiple metrics to rigorously quantify agents' performance and alignment with the assigned role. We provide procedures to create new games and increase games' difficulty to have an evolving benchmark. Importantly, we evaluate critical safety aspects such as the interaction dynamics between agents influenced by greedy and adversarial players. Our benchmark is highly challenging; GPT-3.5 and small models mostly fail, and GPT-4 and SoTA large models (e.g., Llama-3 70b) still underperform.
- Playing repeated games with large language models. arXiv, 2023.
- Jacob Andreas. Language models as agent models. In Findings of EMNLP, 2022.
- Human-level play in the game of diplomacy by combining language models with strategic reasoning. Science, 378(6624):1067–1074, 2022.
- Language models are few-shot learners. In NeurIPS, 2020.
- Google Duplex. A.i. assistant calls local businesses to make appointments. [Link].
- Improving language model negotiation with self-play and in-context learning from ai feedback. arXiv, 2023.
- Understanding social reasoning in language models with language models. arXiv, 2023a.
- Strategic reasoning with language models. arXiv, 2023b.
- Reasoning with language model is planning with world model. arXiv, 2023.
- HBR. How walmart automated supplier negotiations. [Link].
- What would jiminy cricket do? towards agents that behave morally. NeurIPS, 2022.
- Icertis. Negotiate better outcomes and reduce risk across high-volume enterprise contracts with ai-powered insights. [Link].
- Negotiation and honesty in artificial intelligence methods for the board game of diplomacy. Nature Communications, 13(1):7214, 2022.
- Passive learning of active causal strategies in agents and language models. arXiv, 2023.
- Agentbench: Evaluating llms as agents. arXiv, 2023.
- LSB. Article: Negotiation planning. [Link].
- Chameleon: Plug-and-play compositional reasoning with large language models. arXiv, 2023a.
- Are emergent abilities in large language models just in-context learning? arXiv, 2023b.
- Luminance. Luminance announces ai-powered chatbot in latest application of its legal-grade large language model. [Link].
- Microsoft. Building the new bing. [Link], 2023a.
- Microsoft. Reinventing search with a new ai-powered microsoft bing and edge, your copilot for the web. [Link], 2023b.
- Microsoft. Introducing microsoft 365 copilot – your copilot for work. [Link], 2023c.
- OpenAI. Chatgpt plugins. [Link], 2023a.
- OpenAI. Gpt-4 technical report. arXiv, 2023b.
- Pactum. Autonomous negotiations for companies with revenue over $5 billion. [Link].
- Do the rewards justify the means? measuring trade-offs between rewards and ethical behavior in the machiavelli benchmark. In ICML, 2023.
- No-press diplomacy: Modeling multi-agent gameplay. NeurIPS, 2019.
- Generative agents: Interactive simulacra of human behavior. arXiv, 2023.
- Gorilla: Large language model connected with massive apis. arXiv, 2023.
- Social iqa: Commonsense reasoning about social interactions. In EMNLP-IJCNLP, 2019.
- Neural theory-of-mind? on the limits of social intelligence in large lms. In EMNLP, 2022.
- Toolformer: Language models can teach themselves to use tools. arXiv, 2023.
- Minding language models’(lack of) theory of mind: A plug-and-play multi-character belief tracker. arXiv, 2023.
- Beyond the imitation game: Quantifying and extrapolating the capabilities of language models. Transactions on Machine Learning Research, 2023.
- Lawrence E Susskind. Scorable games: A better way to teach negotiation. Negot. J., 1:205, 1985.
- Using simulations to teach negotiation: Pedagogical theory and practice. Teaching negotiation: Ideas and innovations, pp. 285–310, 2000.
- Commonsenseqa: A question answering challenge targeting commonsense knowledge. In ACL: HLT, 2019.
- Tomer Ullman. Large language models fail on trivial alterations to theory-of-mind tasks. arXiv, 2023.
- Generating role-playing game quests with gpt language models. IEEE Transactions on Games, 2022.
- Chain-of-thought prompting elicits reasoning in large language models. NeurIPS, 2022.
- Tree of thoughts: Deliberate problem solving with large language models. arXiv, 2023a.
- React: Synergizing reasoning and acting in language models. In ICLR, 2023b.
- I cast detect thoughts: Learning to converse and guide with intents and theory-of-mind in dungeons and dragons. In ACL, 2023.
- Universal and transferable adversarial attacks on aligned language models. arXiv, 2023.