Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
144 tokens/sec
GPT-4o
8 tokens/sec
Gemini 2.5 Pro Pro
46 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Enhancing Open-Domain Task-Solving Capability of LLMs via Autonomous Tool Integration from GitHub (2312.17294v3)

Published 28 Dec 2023 in cs.SE, cs.AI, and cs.IR

Abstract: LLMs excel in traditional natural language processing tasks but struggle with problems that require complex domain-specific calculations or simulations. While equipping LLMs with external tools to build LLM-based agents can enhance their capabilities, existing approaches lack the flexibility to address diverse and ever-evolving user queries in open domains. Currently, there is also no existing dataset that evaluates LLMs on open-domain knowledge that requires tools to solve. To this end, we introduce OpenAct benchmark to evaluate the open-domain task-solving capability, which is built on human expert consultation and repositories in GitHub. It comprises 339 questions spanning 7 diverse domains that need to be solved with domain-specific methods. In our experiments, even state-of-the-art LLMs and LLM-based agents demonstrate unsatisfactory success rates, underscoring the need for a novel approach. Furthermore, we present OpenAgent, a novel LLM-based agent system that can tackle evolving queries in open domains through autonomously integrating specialized tools from GitHub. OpenAgent employs 1) a hierarchical framework where specialized agents handle specific tasks and can assign tasks to inferior agents, 2) a bi-level experience learning mechanism to learn from both humans' and its own experiences to tackle tool flaws. Experiments demonstrate its superior effectiveness and efficiency, which significantly outperforms baselines. Our data and code are open-source at https://github.com/OpenBMB/OpenAct.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (44)
  1. Do as i can, not as i say: Grounding language in robotic affordances. ArXiv preprint, abs/2204.01691, 2022.
  2. AutoGPT. Autogpt. URL https://github.com/Significant-Gravitas/AutoGPT.
  3. Graph of thoughts: Solving elaborate problems with large language models. arXiv preprint arXiv:2308.09687, 2023.
  4. Large language models as tool makers. arXiv preprint arXiv:2305.17126, 2023.
  5. Chateval: Towards better llm-based evaluators through multi-agent debate. arXiv preprint arXiv:2308.07201, 2023.
  6. Agentverse: Facilitating multi-agent collaboration and exploring emergent behaviors in agents. arXiv preprint arXiv:2308.10848, 2023.
  7. Visual programming: Compositional visual reasoning without training. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.  14953–14962, 2023.
  8. Reasoning with language model is planning with world model. arXiv preprint arXiv:2305.14992, 2023a.
  9. Toolkengpt: Augmenting frozen language models with massive tools via tool embeddings. arXiv preprint arXiv:2305.11554, 2023b.
  10. Language models as zero-shot planners: Extracting actionable knowledge for embodied agents. In Kamalika Chaudhuri, Stefanie Jegelka, Le Song, Csaba Szepesvári, Gang Niu, and Sivan Sabato (eds.), International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA, volume 162 of Proceedings of Machine Learning Research, pp.  9118–9147. PMLR, 2022.
  11. Genegpt: Augmenting large language models with domain tools for improved access to biomedical information. ArXiv, 2023.
  12. Camel: Communicative agents for ”mind” exploration of large scale language model society, 2023.
  13. Webgpt: Browser-assisted question-answering with human feedback. ArXiv preprint, abs/2112.09332, 2021.
  14. OpenAI. Chatgpt plugins. URL https://openai.com/blog/chatgpt-plugins.
  15. OpenAI. OpenAI: Introducing ChatGPT, 2022. URL https://openai.com/blog/chatgpt.
  16. OpenAI. Gpt-4 technical report, 2023.
  17. Talm: Tool augmented language models. arXiv preprint arXiv:2205.12255, 2022.
  18. Generative agents: Interactive simulacra of human behavior. arXiv preprint arXiv:2304.03442, 2023.
  19. Gorilla: Large language model connected with massive apis. arXiv preprint arXiv:2305.15334, 2023.
  20. Communicative agents for software development. arXiv preprint arXiv:2307.07924, 2023a.
  21. Creator: Disentangling abstract and concrete reasonings of large language models through tool creation. arXiv preprint arXiv:2305.14318, 2023b.
  22. Webcpm: Interactive web search for chinese long-form question answering. arXiv preprint arXiv:2305.06849, 2023a.
  23. Tool learning with foundation models. arXiv preprint arXiv:2304.08354, 2023b.
  24. Toolllm: Facilitating large language models to master 16000+ real-world apis. arXiv preprint arXiv:2307.16789, 2023c.
  25. Toolformer: Language models can teach themselves to use tools. ArXiv preprint, abs/2302.04761, 2023.
  26. Algorithm of thoughts: Enhancing exploration of ideas in large language models. arXiv preprint arXiv:2308.10379, 2023.
  27. Hugginggpt: Solving ai tasks with chatgpt and its friends in huggingface, 2023.
  28. Reflexion: Language agents with verbal reinforcement learning, 2023.
  29. Restgpt: Connecting large language models with real-world applications via restful apis. arXiv preprint arXiv:2306.06624, 2023.
  30. Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971, 2023a.
  31. Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288, 2023b.
  32. Chatgpt for robotics: Design principles and model abilities. Technical Report MSR-TR-2023-8, Microsoft, February 2023.
  33. Voyager: An open-ended embodied agent with large language models. arXiv preprint arXiv:2305.16291, 2023a.
  34. Recagent: A novel simulation paradigm for recommender systems. arXiv preprint arXiv:2306.02552, 2023b.
  35. Chain-of-thought prompting elicits reasoning in large language models, 2023.
  36. Visual chatgpt: Talking, drawing and editing with visual foundation models. ArXiv preprint, abs/2303.04671, 2023a.
  37. Autogen: Enabling next-gen llm applications via multi-agent conversation framework. arXiv preprint arXiv:2308.08155, 2023b.
  38. XAgent. Xagent: An autonomous agent for complex task solving, 2023.
  39. Chatgpt is not enough: Enhancing large language models with knowledge graphs for fact-aware language modeling. arXiv preprint arXiv:2306.11489, 2023.
  40. React: Synergizing reasoning and acting in language models. ArXiv preprint, abs/2210.03629, 2022.
  41. Tree of thoughts: Deliberate problem solving with large language models. arXiv preprint arXiv:2305.10601, 2023.
  42. Large language model as autonomous decision maker. arXiv preprint arXiv:2308.12519, 2023.
  43. Wider and deeper llm networks are fairer llm evaluators. arXiv preprint arXiv:2308.01862, 2023.
  44. D-bot: Database diagnosis system using large language models. arXiv preprint arXiv:2312.01454, 2023.
Citations (4)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com