Emergent Mind

Making Pre-trained Language Models Better Few-shot Learners

(2012.15723)
Published Dec 31, 2020 in cs.CL and cs.LG

Abstract

The recent GPT-3 model (Brown et al., 2020) achieves remarkable few-shot performance solely by leveraging a natural-language prompt and a few task demonstrations as input context. Inspired by their findings, we study few-shot learning in a more practical scenario, where we use smaller language models for which fine-tuning is computationally efficient. We present LM-BFF--better few-shot fine-tuning of language models--a suite of simple and complementary techniques for fine-tuning language models on a small number of annotated examples. Our approach includes (1) prompt-based fine-tuning together with a novel pipeline for automating prompt generation; and (2) a refined strategy for dynamically and selectively incorporating demonstrations into each context. Finally, we present a systematic evaluation for analyzing few-shot performance on a range of NLP tasks, including classification and regression. Our experiments demonstrate that our methods combine to dramatically outperform standard fine-tuning procedures in this low resource setting, achieving up to 30% absolute improvement, and 11% on average across all tasks. Our approach makes minimal assumptions on task resources and domain expertise, and hence constitutes a strong task-agnostic method for few-shot learning.

Illustration of masked language model pre-training, standard fine-tuning, and proposed LM-BFF technique.

Overview

  • The paper explores methods to improve few-shot learning in moderately-sized pre-trained language models, like BERT and RoBERTa, through novel fine-tuning techniques.

  • Authors introduce an automated prompt generation technique using a Transformer model, T5, to enhance prompt-based fine-tuning processes with limited training data.

  • A new strategy is suggested for in-context learning by carefully selecting demonstrative examples to provide the model with focused context.

  • Systematic evaluations show significant improvements over standard fine-tuning, with gains up to 30% absolute improvement in several NLP tasks.

  • The proposed task-agnostic methods are resource and knowledge-efficient, applicable to various tasks and languages, and improve the utility of PLMs with small datasets.

Overview of Few-shot Learning Techniques

This work investigates how to enhance the few-shot learning capabilities of pre-trained language models (PLMs) of moderately-sized configurations, such as BERT and RoBERTa, by applying novel fine-tuning techniques. Few-shot learning refers to the ability of models to learn from a very limited amount of labeled training data. The focus here is on fine-tuning language models on a small number of examples, which is not only more realistic but also computationally more efficient.

Improved Prompt-based Fine-tuning Approach

Prompt-based fine-tuning is a strategy where the model leverages a task-specific template and generates a textual response—labeled words that complete the prompt. However, the process of discovering the most effective prompts, especially when the amount of training data is small, presents a significant challenge. The authors introduce an automated prompt generation technique that minimises human intervention in designing effective prompts. This is achieved through a combination of search techniques that identify the best-working label words and an innovative algorithm that automatically creates prompt templates using a generative Transformer model, specifically T5.

Novel Demonstration Strategies

In addition to prompt-based fine-tuning, the paper explores the concept of incorporating demonstration examples directly into the input context—a practice known as “in-context learning”—which has shown promise in similar work with models like GPT-3. This work suggests a refined strategy for dynamically selecting demonstration instances that are most informative and discriminative for the task at hand. To mitigate the detrimental effects of less informative or overwhelming contexts, it proposes sampling a single example from each class to form multiple, simple demonstration sets, providing the model with cleaner, more focused context.

Systematic Evaluation and Observations

The paper presents a comprehensive evaluation framework which includes several NLP tasks, such as classification and regression. The experiments demonstrate convincing improvements over standard fine-tuning approaches. The reported results show gains of up to 30% absolute improvement and 11% on average across all tasks evaluated. One illuminating discovery is that their approach—referred to as LM-BFF—"better few-shot fine-tuning of language models," achieves around 90% accuracy on most binary sentence classification tasks with RoBERTa-large, despite being trained on as few as 32 examples.

Task-Agnostic Few-shot Learning Method

The proposed methods are significant because they assume minimal resources and domain knowledge, making them hugely beneficial for a broad range of tasks and languages. Overall, these techniques push the frontiers in task-agnostic few-shot learning and present a strong case for the potential of prompt-based fine-tuning with demonstrations in making the most out of PLMs with small datasets.

Subscribe by Email

Get summaries of trending comp sci papers delivered straight to your inbox:

Unsubscribe anytime.