UPRISE: Universal Prompt Retrieval for Improving Zero-Shot Evaluation (2303.08518v4)
Abstract: LLMs are popular for their impressive abilities, but the need for model-specific fine-tuning or task-specific prompt engineering can hinder their generalization. We propose UPRISE (Universal Prompt Retrieval for Improving zero-Shot Evaluation), which tunes a lightweight and versatile retriever that automatically retrieves prompts for a given zero-shot task input. Specifically, we demonstrate universality in a cross-task and cross-model scenario: the retriever is tuned on a diverse set of tasks, but tested on unseen task types; we use a small frozen LLM, GPT-Neo-2.7B, for tuning the retriever, but test the retriever on different LLMs of much larger scales, such as BLOOM-7.1B, OPT-66B and GPT3-175B. Additionally, we show that UPRISE mitigates the hallucination problem in our experiments with ChatGPT, suggesting its potential to improve even the strongest LLMs. Our model and code are available at https://github.com/microsoft/LMOps.
- Task-aware retrieval with instructions. arXiv preprint arXiv:2211.09260.
- A multitask, multilingual, multimodal evaluation of chatgpt on reasoning, hallucination, and interactivity. arXiv preprint arXiv:2302.04023.
- The fifth PASCAL recognizing textual entailment challenge. In TAC. NIST.
- Think you have solved direct-answer question answering? try arc-da, the direct-answer AI2 reasoning challenge. CoRR, abs/2102.03315.
- PIQA: reasoning about physical commonsense in natural language. In AAAI, pages 7432–7439. AAAI Press.
- GPT-Neo: Large Scale Autoregressive Language Modeling with Mesh-Tensorflow. If you use this software, please cite it using these metadata.
- A large annotated corpus for learning natural language inference. In EMNLP, pages 632–642. The Association for Computational Linguistics.
- Language models are few-shot learners. In NeurIPS.
- Deep reinforcement learning from human preferences. In NIPS, pages 4299–4307.
- Boolq: Exploring the surprising difficulty of natural yes/no questions. In NAACL-HLT (1), pages 2924–2936. Association for Computational Linguistics.
- BERT: pre-training of deep bidirectional transformers for language understanding. In NAACL-HLT (1), pages 4171–4186. Association for Computational Linguistics.
- William B. Dolan and Chris Brockett. 2005. Automatically constructing a corpus of sentential paraphrases. In IWP@IJCNLP. Asian Federation of Natural Language Processing.
- Glam: Efficient scaling of language models with mixture-of-experts. In ICML, volume 162 of Proceedings of Machine Learning Research, pages 5547–5569. PMLR.
- Semantic noise matters for neural natural language generation. In INLG, pages 421–426. Association for Computational Linguistics.
- Making pre-trained language models better few-shot learners. In ACL/IJCNLP (1), pages 3816–3830. Association for Computational Linguistics.
- Twitter sentiment classification using distant supervision. Processing, 150.
- Parameter-efficient transfer learning for NLP. In ICML, volume 97 of Proceedings of Machine Learning Research, pages 2790–2799. PMLR.
- Lora: Low-rank adaptation of large language models. In ICLR. OpenReview.net.
- Language is not all you need: Aligning perception with language models. CoRR, abs/2302.14045.
- Dense passage retrieval for open-domain question answering. In EMNLP (1), pages 6769–6781. Association for Computational Linguistics.
- Looking beyond the surface: A challenge set for reading comprehension over multiple sentences. In NAACL-HLT, pages 252–262. Association for Computational Linguistics.
- Natural questions: a benchmark for question answering research. Trans. Assoc. Comput. Linguistics, 7:452–466.
- Towards few-shot fact-checking via perplexity. In NAACL-HLT, pages 1971–1981. Association for Computational Linguistics.
- The power of scale for parameter-efficient prompt tuning. In EMNLP (1), pages 3045–3059. Association for Computational Linguistics.
- The winograd schema challenge. In KR. AAAI Press.
- Xiang Lisa Li and Percy Liang. 2021. Prefix-tuning: Optimizing continuous prompts for generation. In ACL/IJCNLP (1), pages 4582–4597. Association for Computational Linguistics.
- Commongen: A constrained text generation challenge for generative commonsense reasoning. In EMNLP (Findings), volume EMNLP 2020 of Findings of ACL, pages 1823–1840. Association for Computational Linguistics.
- Truthfulqa: Measuring how models mimic human falsehoods. In ACL (1), pages 3214–3252. Association for Computational Linguistics.
- What makes good in-context examples for gpt-3? In DeeLIO@ACL, pages 100–114. Association for Computational Linguistics.
- GPT understands, too. CoRR, abs/2103.10385.
- Pointer sentinel mixture models.
- Can a suit of armor conduct electricity? A new dataset for open book question answering. In EMNLP, pages 2381–2391. Association for Computational Linguistics.
- DART: open-domain structured data record to text generation. In NAACL-HLT, pages 432–447. Association for Computational Linguistics.
- Annotated gigaword. In AKBC-WEKEX@NAACL-HLT, pages 95–100. Association for Computational Linguistics.
- Altaf Rahman and Vincent Ng. 2012. Resolving complex cases of definite pronouns: The winograd schema challenge. In EMNLP-CoNLL, pages 777–789. ACL.
- Know what you don’t know: Unanswerable questions for squad. In ACL (2), pages 784–789. Association for Computational Linguistics.
- Squad: 100, 000+ questions for machine comprehension of text. In EMNLP, pages 2383–2392. The Association for Computational Linguistics.
- Nils Reimers and Iryna Gurevych. 2019. Sentence-bert: Sentence embeddings using siamese bert-networks. In EMNLP/IJCNLP (1), pages 3980–3990. Association for Computational Linguistics.
- Stephen E. Robertson and Hugo Zaragoza. 2009. The probabilistic relevance framework: BM25 and beyond. Found. Trends Inf. Retr., 3(4):333–389.
- Choice of plausible alternatives: An evaluation of commonsense causal reasoning. In AAAI Spring Symposium: Logical Formalizations of Commonsense Reasoning. AAAI.
- Learning to retrieve prompts for in-context learning. In NAACL-HLT, pages 2655–2671. Association for Computational Linguistics.
- Winogrande: An adversarial winograd schema challenge at scale. In AAAI, pages 8732–8740. AAAI Press.
- Multitask prompted training enables zero-shot task generalization. In ICLR. OpenReview.net.
- BLOOM: A 176b-parameter open-access multilingual language model. CoRR, abs/2211.05100.
- Toolformer: Language models can teach themselves to use tools. CoRR, abs/2302.04761.
- Recursive deep models for semantic compositionality over a sentiment treebank. In EMNLP, pages 1631–1642. ACL.
- One embedder, any task: Instruction-finetuned text embeddings.
- Lamda: Language models for dialog applications. CoRR, abs/2201.08239.
- The FEVER2.0 shared task. In Proceedings of the Second Workshop on Fact Extraction and VERification (FEVER).
- Don’t prompt, search! mining-based zero-shot learning with language models. In EMNLP, pages 7508–7520. Association for Computational Linguistics.
- Representation learning with contrastive predictive coding. CoRR, abs/1807.03748.
- GLUE: A multi-task benchmark and analysis platform for natural language understanding. In ICLR (Poster). OpenReview.net.
- Finetuned language models are zero-shot learners. In ICLR. OpenReview.net.
- Chain of thought prompting elicits reasoning in large language models. CoRR, abs/2201.11903.
- A broad-coverage challenge corpus for sentence understanding through inference. In NAACL-HLT, pages 1112–1122. Association for Computational Linguistics.
- Compositional exemplars for in-context learning.
- Retrieval of soft prompt enhances zero-shot task generalization. arXiv preprint arXiv:2210.03029.
- Bitfit: Simple parameter-efficient fine-tuning for transformer-based masked language-models. In ACL (2), pages 1–9. Association for Computational Linguistics.
- Hellaswag: Can a machine really finish your sentence? In ACL (1), pages 4791–4800. Association for Computational Linguistics.
- Rui Zhang and Joel R. Tetreault. 2019. This email could save your life: Introducing the task of email subject line generation. In ACL (1), pages 446–456. Association for Computational Linguistics.
- OPT: open pre-trained transformer language models. CoRR, abs/2205.01068.
- Character-level convolutional networks for text classification. In NIPS, pages 649–657.
- PAWS: paraphrase adversaries from word scrambling. In NAACL-HLT (1), pages 1298–1308. Association for Computational Linguistics.
- Multimodal chain-of-thought reasoning in language models. CoRR, abs/2302.00923.