Emergent Mind

In-Context Learning for Text Classification with Many Labels

(2309.10954)
Published Sep 19, 2023 in cs.CL and cs.LG

Abstract

In-context learning (ICL) using LLMs for tasks with many labels is challenging due to the limited context window, which makes it difficult to fit a sufficient number of examples in the prompt. In this paper, we use a pre-trained dense retrieval model to bypass this limitation, giving the model only a partial view of the full label space for each inference call. Testing with recent open-source LLMs (OPT, LLaMA), we set new state of the art performance in few-shot settings for three common intent classification datasets, with no finetuning. We also surpass fine-tuned performance on fine-grained sentiment classification in certain cases. We analyze the performance across number of in-context examples and different model scales, showing that larger models are necessary to effectively and consistently make use of larger context lengths for ICL. By running several ablations, we analyze the model's use of: a) the similarity of the in-context examples to the current input, b) the semantic content of the class names, and c) the correct correspondence between examples and labels. We demonstrate that all three are needed to varying degrees depending on the domain, contrary to certain recent works.

Overview

  • The paper studies in-context learning (ICL) in LLMs for multi-label text classification, using retrieval models to overcome context window limitations.

  • Researchers used a retrieval-augmented ICL where a Sentence-BERT model identifies relevant examples to include in the LLM prompt.

  • The research employed a 'greedy' approach for maximizing LLM context window use and utilized the output from the LLM to match the closest class without extra computational cost.

  • Experimental results showed state-of-the-art performance in few-shot scenarios and, in some cases, outperformed fine-tuned models in sentiment analysis.

  • The study suggests that larger models are better at using more in-context examples and positions retrieval-augmented ICL as an effective method for complex classification tasks.

Introduction

The exploration of in-context learning (ICL) using LLMs entails examining its potential for handling text classification tasks with many labels. To circumvent the inherent limitation of LLMs' constrained context windows, which restrict the number of examples that can be featured in a prompt, researchers have integrated secondary pre-trained retrieval models. This fusion enables the language models to ingest only a pertinent subset of labels for each inference, paving the way for application to domains previously deemed infeasible for LLMs' capabilities without the necessity of fine-tuning.

Methodology

This study introduces a retrieval-augmented ICL where a dense retrieval model, specifically a Sentence-BERT pre-trained on extensive text pair datasets, dynamically identifies a relevant set of examples based on cosine similarity to the input. The research utilizes a "greedy" approach to fill the prompt to its capacity, maximizing the usage of the LLMs' context window. Importantly, the research avoids additional computational costs during inference by having the LLM freely generate output, which is then matched to the closest class using the same retrieval model.

Experimental Insights

The performance gained through the proposed retrieval-augmented ICL is noteworthy, with state-of-the-art (SoTA) strides observed in few-shot settings across various intent classification benchmarks and even outperforming fine-tuned approaches in certain fine-grained sentiment analysis scenarios. Moreover, the research explore the contribution of the semantic content of class names, correct example-label correspondence, and similarity of in-context examples to the current input, deducing their varying degrees of importance across datasets. The study also reveals model scale to be a crucial factor in leveraging a higher number of in-context examples.

Conclusion and Future Directions

The findings confirm the retrieval-augmented ICL's prowess in addressing multi-label text classification without necessitating further adjustments to the retriever or LLMs, harnessing their pre-training strengths instead. The research points to the larger model architectures being more adept in capitalizing on a broader context when making use of in-context learning. In closing, the paper positions retrieval-augmented ICL as a powerful paradigm for efficiently handling complex classification tasks, introducing a transformative technique in the deployment of LLMs across diverse domains and task scopes.

Create an account to read this summary for free:

Newsletter

Get summaries of trending comp sci papers delivered straight to your inbox:

Unsubscribe anytime.