Language models (LMs) exhibit remarkable abilities to solve new tasks from just a few examples or textual instructions, especially at scale. They also, paradoxically, struggle with basic functionality, such as arithmetic or factual lookup, where much simpler and smaller models excel. In this paper, we show that LMs can teach themselves to use external tools via simple APIs and achieve the best of both worlds. We introduce Toolformer, a model trained to decide which APIs to call, when to call them, what arguments to pass, and how to best incorporate the results into future token prediction. This is done in a self-supervised way, requiring nothing more than a handful of demonstrations for each API. We incorporate a range of tools, including a calculator, a Q&A system, two different search engines, a translation system, and a calendar. Toolformer achieves substantially improved zero-shot performance across a variety of downstream tasks, often competitive with much larger models, without sacrificing its core language modeling abilities.
Toolformer introduces a novel approach for language models to autonomously use external tools via APIs, enhancing their performance on tasks requiring real-time data or arithmetic precision.
The model relies on a self-supervised learning process without extensive human annotation, using API calls and outcomes to improve across diverse tasks.
Experiments show Toolformer's efficacy in factual completion, mathematical reasoning, and understanding of multilingual and temporal questions.
The methodology propels forward the capability of LMs to autonomously interface with tools, indicating promising directions for future AI research and applications.
Language models (LMs) have shown remarkable proficiency in numerous natural language processing tasks by leveraging large datasets for pretraining. Despite their success, these models encounter limitations when tasked with problems that require real-time data access, arithmetic precision, or understanding of low-resource languages. The paper introduces Toolformer, a novel approach that enables LMs to autonomously decide when and how to utilize external tools via simple APIs. This capability is acquired through a self-supervised learning process that does not necessitate extensive human annotation, thus preserving the model's generality across tasks. By incorporating tools such as calculators, translation systems, and search engines, Toolformer demonstrates substantial enhancements in zero-shot performance on diverse downstream tasks without compromising its intrinsic language modeling capabilities.
Toolformer is predicated on the insight that LMs can generate and evaluate their own dataset annotations using APIs, provided they have a handful of demonstrations for each tool. The LM is fine-tuned on an augmented dataset where API calls and their outcomes are integrated based on a self-supervised loss calculation, which measures the utility of these insertions in predicting subsequent tokens. This process involves several steps:
This methodology facilitates a LM's ability to autonomously leverage external tools that compensate for its innate limitations.
The experimental evaluation of Toolformer illustrates its effectiveness across a spectrum of tasks:
Toolformer's ability to autonomously decide the most appropriate tool to use, and its application thereof, leads to performance improvements that were previously unattainable without human intervention or explicit instruction on tool usage.
The introduction of Toolformer opens several avenues for future research and practical application in the realm of generative AI. Its ability to augment LMs with the capability to interface with external tools autonomously and in a context-aware manner can significantly expand the scope of tasks these models can handle effectively. This includes real-time information retrieval, precise quantitative analysis, and handling inputs in low-resource languages with greater accuracy.
Potential future developments could focus on enhancing the interactive capabilities of Toolformer, enabling it to perform sequential tool usage and iteratively refine queries based on tool responses. Such advancements could further bridge the gap between the static knowledge embedded in pretraining datasets and the dynamic information landscape of the real world.
By extending the functional reach of LMs through self-taught tool use, Toolformer presents a compelling narrative on the evolving capabilities of generative AI, marking an important step towards more versatile and autonomous artificial intelligence systems.