Papers
Topics
Authors
Recent
2000 character limit reached

Learning to Use Tools via Cooperative and Interactive Agents (2403.03031v4)

Published 5 Mar 2024 in cs.CL

Abstract: Tool learning empowers LLMs as agents to use external tools and extend their utility. Existing methods employ one single LLM-based agent to iteratively select and execute tools, thereafter incorporating execution results into the next action prediction. Despite their progress, these methods suffer from performance degradation when addressing practical tasks due to: (1) the pre-defined pipeline with restricted flexibility to calibrate incorrect actions, and (2) the struggle to adapt a general LLM-based agent to perform a variety of specialized actions. To mitigate these problems, we propose ConAgents, a Cooperative and interactive Agents framework, which coordinates three specialized agents for tool selection, tool execution, and action calibration separately. ConAgents introduces two communication protocols to enable the flexible cooperation of agents. To effectively generalize the ConAgents into open-source models, we also propose specialized action distillation, enhancing their ability to perform specialized actions in our framework. Our extensive experiments on three datasets show that the LLMs, when equipped with the ConAgents, outperform baselines with substantial improvement (i.e., up to 14% higher success rate).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (50)
  1. Large language models as tool makers. arxiv.
  2. A survey on evaluation of large language models. ACM T Intel Syst Tec.
  3. Agentverse: Facilitating multi-agent collaboration and exploring emergent behaviors in agents. arxiv.
  4. Lm vs lm: Detecting factual errors via cross examination. arxiv.
  5. Improving factuality and reasoning in language models through multiagent debate. arxiv.
  6. Faith and fate: Limits of transformers on compositionality. arxiv.
  7. Improving language model negotiation with self-play and in-context learning from ai feedback. arxiv.
  8. PAL: Program-aided language models. In PMLR.
  9. Confucius: Iterative tool learning from introspection feedback by easy-to-difficult curriculum. arxiv.
  10. Toolkengpt: Augmenting frozen language models with massive tools via tool embeddings. arxiv.
  11. Metatool benchmark for large language models: Deciding whether to use tools and which to use. arxiv.
  12. Genegpt: Augmenting large language models with domain tools for improved access to biomedical information. arxiv.
  13. Language models can solve computer tasks. arxiv.
  14. Api-bank: A comprehensive benchmark for tool-augmented llms. In EMNLP.
  15. Encouraging divergent thinking in large language models through multi-agent debate. arxiv.
  16. Dynamic llm-agent network: An llm-agent collaboration framework with agent team optimization. arxiv.
  17. Chameleon: Plug-and-play compositional reasoning with large language models. arxiv.
  18. When not to trust language models: Investigating effectiveness of parametric and non-parametric memories. In ACL.
  19. Social learning: Towards collaborative learning with large language models. arxiv.
  20. Webgpt: Browser-assisted question-answering with human feedback. arxiv.
  21. Art: Automatic multi-step reasoning and tool-use for large language models. arxiv.
  22. Gorilla: Large language model connected with massive apis. arxiv.
  23. Adapt: As-needed decomposition and planning with language models. arxiv.
  24. Communicative agents for software development. arxiv.
  25. Experiential co-learning of software-developing agents. arxiv.
  26. Autoact: Automatic agent learning from scratch via self-planning. arxiv.
  27. WebCPM: Interactive web search for Chinese long-form question answering. In ACL.
  28. Tool learning with foundation models. arxiv.
  29. Toolllm: Facilitating large language models to master 16000+ real-world apis. arxiv.
  30. Toolformer: Language models can teach themselves to use tools. arxiv.
  31. Hugginggpt: Solving ai tasks with chatgpt and its friends in huggingface. arxiv.
  32. Large language models can be easily distracted by irrelevant context. In PMLR.
  33. Restgpt: Connecting large language models with real-world applications via restful apis. arxiv.
  34. Corex: Pushing the boundaries of complex reasoning through multi-model collaboration. arxiv.
  35. Yashar Talebirad and Amirhossein Nadiri. 2023. Multi-agent collaboration: Harnessing the power of intelligent llm agents. arxiv.
  36. Toolalpaca: Generalized tool learning for language models with 3000 simulated cases. arxiv.
  37. Mac-sql: Multi-agent collaboration for text-to-sql. arxiv.
  38. Mint: Evaluating llms in multi-turn interaction with tools and language feedback. arxiv.
  39. Self-instruct: Aligning language models with self-generated instructions. In ACL.
  40. Describe, explain, plan and select: Interactive planning with large language models enables open-world multi-task agents. arxiv.
  41. Visual chatgpt: Talking, drawing and editing with visual foundation models. arxiv.
  42. A survey on knowledge distillation of large language models. arXiv.
  43. Gpt4tools: Teaching large language model to use tools via self-instruction. arxiv.
  44. Mm-react: Prompting chatgpt for multimodal reasoning and action. arxiv.
  45. React: Synergizing reasoning and acting in language models. In ICLR.
  46. Tooleyes: Fine-grained evaluation for tool learning capabilities of large language models in real-world scenarios. arxiv.
  47. Toolsword: Unveiling safety issues of large language models in tool learning across three stages. arxiv.
  48. Lumos: Learning agents with unified data, modular design, and open-source llms. arxiv.
  49. Agentcf: Collaborative learning with autonomous language agents for recommender systems. arxiv.
  50. Toolqa: A dataset for llm question answering with external tools. arxiv.
Citations (11)

Summary

  • The paper presents a novel multi-agent framework (ConAgents) that decomposes tool learning tasks among specialized agents for improved execution.
  • It introduces Iterative Calibration to adaptively refine tool use through feedback from tool servers and code interpreters, boosting success rates by up to 6%.
  • The modular design optimizes computational efficiency and sets the stage for extending multi-agent systems to include multi-modality applications.

Learning to Use Tools via Cooperative and Interactive Agents

Introduction

The paper "Learning to Use Tools via Cooperative and Interactive Agents" (2403.03031) presents a novel approach to enhancing the proficiency of LLMs in tool learning, addressing limitations inherent in single-agent systems. Conventional methodologies that employ a solitary LLM agent face challenges in complex tasks due to the restricted capabilities of one agent and the difficulty in error correction during task failures. This work introduces ConAgents, a framework leveraging multiple specialized agents—Grounding, Execution, and Observing—to distribute task workloads and incorporate feedback from the tool environment for adaptive iteration.

Methodology

Cooperative Framework Design

ConAgents decomposes the task workflow into three distinct modules managed by independent LLM-based agents, which collectively enhance tool learning capabilities.

  • Grounding Agent: This agent is responsible for reasoning the user's input task and generating specific tool-use instructions, such as selecting the most applicable tool and defining target outputs.
  • Execution Agent: Following the tool-use instruction, this agent completes necessary arguments and requests data from tool servers, handling the execution process.
  • Observing Agent: Designed to efficiently incorporate lengthy execution results, this agent dynamically generates functions to extract relevant values, negating the need for a pre-defined schema and allowing more flexible handling of execution outputs. Figure 1

    Figure 1: Comparison between existing single-agent tool learning method (a) and our ConAgents (b).

Iterative Calibration Method

The paper introduces Iterative Calibration (IterCali) to dynamically optimize agent performance by utilizing environmental feedback.

  • Tool Server Interaction: The execution agent calibrates incorrect arguments based on error messages received from tool server responses, facilitating real-time correction of issues during tool invocation.
  • Code Interpreter Feedback: Observing agents adjust the generated code based on programming errors flagged by interpreters, ensuring accurate value extraction from execution results. Figure 2

    Figure 2: The prompts for our Iterative Calibration (IterCali) method, which instructs the execution agent to calibrate generated arguments (a) and observing agent to refine generated code (b) with external feedback.

Experimental Setup

Experiments were conducted across three datasets, demonstrating ConAgents' effectiveness over state-of-the-art baselines, achieving up to a 6% improvement in success rate metrics. Human evaluations further supported these findings, illustrating the framework's competency in logical tool execution and result parsing.

Efficiency and Performance

Qualitative analyses highlight the efficiency of ConAgents in terms of token consumption during inference, showcasing its competitiveness against multi-turn baselines like ReAct@N. The modular design facilitates reduced computational overhead by optimizing each agent's operations. Figure 3

Figure 3: The efficiency analysis for different methods, where we count the distribution of consumed tokens and compute the average consumption mu.

Conclusion

ConAgents introduces a robust, modular approach to tool learning, enabling LLMs to tackle complex tasks more effectively through cooperative interaction and adaptive calibration. By decoupling task components and leveraging feedback mechanisms, this framework sets the groundwork for advancing multi-agent systems in AI tool interaction.

Theoretical and practical implications suggest potential advancements in AI collaborative frameworks, prompting further exploration into multi-agent systems beyond textual domains, incorporating visual and auditory information. Future work could explore extending agent capabilities with multi-modality support, broadening the applicability of such cooperative architectures.

Dice Question Streamline Icon: https://streamlinehq.com

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets

This paper has been mentioned in 3 tweets and received 12 likes.

Upgrade to Pro to view all of the tweets about this paper: