ZeroPrompt: Scaling Prompt-Based Pretraining to 1,000 Tasks Improves Zero-Shot Generalization

Published 18 Jan 2022 in cs.LG and cs.CL | (2201.06910v2)

Abstract: We propose a multitask pretraining approach ZeroPrompt for zero-shot generalization, focusing on task scaling and zero-shot prompting. While previous models are trained on only a few dozen tasks, we scale to 1,000 tasks for the first time using real-world data. This leads to a crucial discovery that task scaling can be an efficient alternative to model scaling; i.e., the model size has little impact on performance with an extremely large number of tasks. Our results show that task scaling can substantially improve training efficiency by 30 times in FLOPs. Moreover, we present a prompting method that incorporates a genetic algorithm to automatically search for the best prompt for unseen tasks, along with a few other improvements. Empirically, ZeroPrompt substantially improves both the efficiency and the performance of zero-shot learning across a variety of academic and production datasets.

Abstract PDF Upgrade to Chat

Citations (66)

View on Semantic Scholar

Summary

The paper presents a novel multitask pretraining approach that leverages 1,000 diverse tasks to enhance zero-shot generalization.
The paper demonstrates that task scaling enables a 0.4B parameter model to match a 12B model, achieving up to 30 times more training efficiency.
The paper shows improved zero-shot performance without task-specific fine-tuning, outperforming models such as CPM-2 and RoBERTa-large.

Analysis of ZeroPrompt: Scaling Prompt-Based Pretraining to 1,000 Tasks

In the paper "ZeroPrompt: Scaling Prompt-Based Pretraining to 1,000 Tasks Improves Zero-shot Generalization," the authors propose a novel multitask pretraining approach named ZeroPrompt aimed at enhancing zero-shot generalization in LLMs. The primary focus is on task scaling rather than model scaling, which presents notable insights into improving the efficiency and performance of zero-shot learning.

Key Findings

The authors address a significant gap in existing research, which has predominantly focused on model and prompt scaling across a comparatively small number of tasks. ZeroPrompt expands the scope to 1,000 real-world tasks, leading to several notable empirical observations:

Task Scaling vs. Model Scaling: The research reveals task scaling as a viable and efficient alternative to model scaling. With an extensive range of tasks, the dependence on model size diminishes, allowing a smaller 0.4 billion parameter model to achieve similar zero-shot performance levels as a larger 12 billion parameter model. This finding is particularly important as it suggests substantial gains in training efficiency, with up to 30 times improvement in FLOPs.
Zero-shot Performance: ZeroPrompt significantly improves zero-shot learning across diverse datasets, outperforming prior models such as CPM-2 and Pangu- $\alpha$ . In some cases, ZeroPrompt is competitive with or superior to finetuned models like RoBERTa-large. This performance is achieved without any task-specific fine-tuning, underscoring the strength of prompt-based training on large task portfolios.
Power of Task Scaling: Direct comparisons with other large-scale pretrained models indicate the superior performance of ZeroPrompt in a zero-shot setting, coupled with improved training and serving efficiency. The research highlights that small models can yield impressive zero-shot results when trained across numerous tasks, further reducing computational requirements.

Experimental Approaches

ZeroPrompt utilizes labeled data in pretraining, adopting a combination of task-specific soft prompts and label verbalizers designed to optimize zero-shot performance. The experiments conducted explore the task scaling limits using varied data augmentation techniques, indicative of robust performance gains without traditional, computationally expensive model scaling.

Additionally, the paper evaluates the influence of cross-task transfer on unseen task types, illustrating that zero-shot performance can benefit from multitask prompted pretraining, particularly when training and test tasks exhibit distribution similarities.

Implications and Future Directions

The implications of this research are multifaceted, extending both theoretical and practical domains in AI:

Training Efficiency: The findings underscore the potential to enhance model training efficiency by scaling tasks rather than parameters, which can lead to reduced computational and energy costs.
Resource Management: This approach could offer significant advantages in settings where data labeling is costly or unavailable, positioning zero-shot learning as a scalable solution in commercial and academic applications.
Future Research: While the results are promising, the paper also identifies limitations related to the diversity of task types and dataset overlap, advocating further exploration of task distribution optimization strategies and cross-domain applicability.

In conclusion, ZeroPrompt provides a compelling framework for leveraging task diversity to enhance model generalization capabilities, serving as a potential turning point in multitask learning and zero-shot adaptation methodologies. This paper provides essential insights into evolving prompt-based training paradigms, suggesting a broader impact on future artificial intelligence systems across diverse natural language processing tasks.

Markdown Report Issue