Making Pre-trained Language Models Better Few-shot Learners (2012.15723v2)

Published 31 Dec 2020 in cs.CL and cs.LG

Abstract: The recent GPT-3 model (Brown et al., 2020) achieves remarkable few-shot performance solely by leveraging a natural-language prompt and a few task demonstrations as input context. Inspired by their findings, we study few-shot learning in a more practical scenario, where we use smaller LLMs for which fine-tuning is computationally efficient. We present LM-BFF--better few-shot fine-tuning of LLMs--a suite of simple and complementary techniques for fine-tuning LLMs on a small number of annotated examples. Our approach includes (1) prompt-based fine-tuning together with a novel pipeline for automating prompt generation; and (2) a refined strategy for dynamically and selectively incorporating demonstrations into each context. Finally, we present a systematic evaluation for analyzing few-shot performance on a range of NLP tasks, including classification and regression. Our experiments demonstrate that our methods combine to dramatically outperform standard fine-tuning procedures in this low resource setting, achieving up to 30% absolute improvement, and 11% on average across all tasks. Our approach makes minimal assumptions on task resources and domain expertise, and hence constitutes a strong task-agnostic method for few-shot learning.

Citations (1,779)

View on Semantic Scholar

Summary

The paper introduces an automated prompt generation technique that minimizes human intervention in designing effective templates.
It leverages dynamic in-context learning by sampling clear demonstration examples to provide focused contextual information.
The method achieves up to a 30% absolute improvement and around 90% accuracy in binary classification tasks with very few training instances.

Overview of Few-shot Learning Techniques

This work investigates how to enhance the few-shot learning capabilities of pre-trained LLMs (PLMs) of moderately-sized configurations, such as BERT and RoBERTa, by applying novel fine-tuning techniques. Few-shot learning refers to the ability of models to learn from a very limited amount of labeled training data. The focus here is on fine-tuning LLMs on a small number of examples, which is not only more realistic but also computationally more efficient.

Improved Prompt-based Fine-tuning Approach

Prompt-based fine-tuning is a strategy where the model leverages a task-specific template and generates a textual response—labeled words that complete the prompt. However, the process of discovering the most effective prompts, especially when the amount of training data is small, presents a significant challenge. The authors introduce an automated prompt generation technique that minimises human intervention in designing effective prompts. This is achieved through a combination of search techniques that identify the best-working label words and an innovative algorithm that automatically creates prompt templates using a generative Transformer model, specifically T5.

Novel Demonstration Strategies

In addition to prompt-based fine-tuning, the paper explores the concept of incorporating demonstration examples directly into the input context—a practice known as “in-context learning”—which has shown promise in similar work with models like GPT-3. This work suggests a refined strategy for dynamically selecting demonstration instances that are most informative and discriminative for the task at hand. To mitigate the detrimental effects of less informative or overwhelming contexts, it proposes sampling a single example from each class to form multiple, simple demonstration sets, providing the model with cleaner, more focused context.

Systematic Evaluation and Observations

The paper presents a comprehensive evaluation framework which includes several NLP tasks, such as classification and regression. The experiments demonstrate convincing improvements over standard fine-tuning approaches. The reported results show gains of up to 30% absolute improvement and 11% on average across all tasks evaluated. One illuminating discovery is that their approach—referred to as LM-BFF—"better few-shot fine-tuning of LLMs," achieves around 90% accuracy on most binary sentence classification tasks with RoBERTa-large, despite being trained on as few as 32 examples.

Task-Agnostic Few-shot Learning Method

The proposed methods are significant because they assume minimal resources and domain knowledge, making them hugely beneficial for a broad range of tasks and languages. Overall, these techniques push the frontiers in task-agnostic few-shot learning and present a strong case for the potential of prompt-based fine-tuning with demonstrations in making the most out of PLMs with small datasets.