Emergent Mind

Abstract

LLMs have made significant progress in utilizing tools, but their ability is limited by API availability and the instability of implicit reasoning, particularly when both planning and execution are involved. To overcome these limitations, we propose CREATOR, a novel framework that enables LLMs to create their own tools using documentation and code realization. CREATOR disentangles abstract tool creation and concrete decision execution, resulting in improved performance. We evaluate CREATOR on MATH and TabMWP benchmarks, respectively consisting of challenging math competition problems and diverse tabular contents. Remarkably, CREATOR outperforms existing chain-of-thought, program-of-thought, and tool-using baselines. Additionally, we introduce the Creation Challenge dataset, featuring 2K diverse questions, to emphasize the necessity and benefits of LLMs' tool creation ability. Further research demonstrates that leveraging LLMs as tool creators facilitates knowledge transfer, and LLMs exhibit varying levels of tool creation abilities, enabling them to adapt to diverse situations. The tool creation ability revolutionizes the LLM's problem-solving paradigm, driving us closer to the next frontier of artificial intelligence. All the codes and data are released.

We're not able to analyze this paper right now due to high demand.

Please check back later (sorry!).

Generate a detailed summary of this paper with a premium account.

We ran into a problem analyzing this paper.

Subscribe by Email

Get summaries of trending comp sci papers delivered straight to your inbox:

Unsubscribe anytime.

References
  1. Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901.
  2. Evaluating Large Language Models Trained on Code
  3. Program of Thoughts Prompting: Disentangling Computation from Reasoning for Numerical Reasoning Tasks
  4. PaLM: Scaling Language Modeling with Pathways
  5. Scaling Instruction-Finetuned Language Models
  6. A Survey on In-context Learning
  7. CodeBERT: A Pre-Trained Model for Programming and Natural Languages
  8. PAL: Program-aided Language Models
  9. Visual Programming: Compositional visual reasoning without training
  10. Measuring mathematical problem solving with the math dataset. Sort, 2(4):0–6.
  11. Measuring Mathematical Problem Solving With the MATH Dataset
  12. Internet-augmented dialogue generation. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 8460–8478.
  13. TaskMatrix.AI: Completing Tasks by Connecting Foundation Models with Millions of APIs
  14. Visual Instruction Tuning
  15. The Flan Collection: Designing Data and Methods for Effective Instruction Tuning
  16. Chameleon: Plug-and-Play Compositional Reasoning with Large Language Models
  17. Dynamic Prompt Learning via Policy Gradient for Semi-structured Mathematical Reasoning
  18. A Survey of Deep Learning for Mathematical Reasoning
  19. Augmented language models: a survey
  20. WebGPT: Browser-assisted question-answering with human feedback
  21. Show Your Work: Scratchpads for Intermediate Computation with Language Models
  22. OpenAI. 2022. Chatgpt.
  23. OpenAI. 2023. Gpt-4 technical report.
  24. Are nlp models really able to solve simple math word problems? In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 2080–2094
  25. Tool learning with foundation models
  26. Toolformer: Language Models Can Teach Themselves to Use Tools
  27. HuggingGPT: Solving AI Tasks with ChatGPT and its Friends in Hugging Face
  28. BlenderBot 3: a deployed conversational agent that continually learns to responsibly engage
  29. ViperGPT: Visual Inference via Python Execution for Reasoning
  30. LLaMA: Open and Efficient Foundation Language Models
  31. Interleaving Retrieval with Chain-of-Thought Reasoning for Knowledge-Intensive Multi-Step Questions
  32. Code4structure: Code generation for few-shot structure prediction from natural language. In arxiv.
  33. Learning to generate from textual interactions. In arxiv.
  34. Super-naturalinstructions: Generalization via declarative instructions on 1600+ nlp tasks. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 5085–5109.
  35. Chain-of-thought prompting elicits reasoning in large language models. In Advances in Neural Information Processing Systems.
  36. Visual ChatGPT: Talking, Drawing and Editing with Visual Foundation Models
  37. Pengfei Yu and Heng Ji. 2023. Self information update for large language models through mitigating exposure bias. In arxiv.

Show All 37