Emergent Mind

Abstract

LLMs are popular for their impressive abilities, but the need for model-specific fine-tuning or task-specific prompt engineering can hinder their generalization. We propose UPRISE (Universal Prompt Retrieval for Improving zero-Shot Evaluation), which tunes a lightweight and versatile retriever that automatically retrieves prompts for a given zero-shot task input. Specifically, we demonstrate universality in a cross-task and cross-model scenario: the retriever is tuned on a diverse set of tasks, but tested on unseen task types; we use a small frozen LLM, GPT-Neo-2.7B, for tuning the retriever, but test the retriever on different LLMs of much larger scales, such as BLOOM-7.1B, OPT-66B and GPT3-175B. Additionally, we show that UPRISE mitigates the hallucination problem in our experiments with ChatGPT, suggesting its potential to improve even the strongest LLMs. Our model and code are available at https://github.com/microsoft/LMOps.

We're not able to analyze this paper right now due to high demand.

Please check back later (sorry!).

Generate a detailed summary of this paper with a premium account.

We ran into a problem analyzing this paper.

Subscribe by Email

Get summaries of trending comp sci papers delivered straight to your inbox:

Unsubscribe anytime.

References
  1. Task-aware Retrieval with Instructions
  2. A Multitask, Multilingual, Multimodal Evaluation of ChatGPT on Reasoning, Hallucination, and Interactivity
  3. The fifth PASCAL recognizing textual entailment challenge. In TAC. NIST.
  4. Think you have Solved Direct-Answer Question Answering? Try ARC-DA, the Direct-Answer AI2 Reasoning Challenge
  5. PIQA: reasoning about physical commonsense in natural language. In AAAI, pages 7432–7439. AAAI Press.
  6. GPT-Neo: Large Scale Autoregressive Language Modeling with Mesh-Tensorflow. If you use this software, please cite it using these metadata.
  7. A large annotated corpus for learning natural language inference. In EMNLP, pages 632–642. The Association for Computational Linguistics.
  8. Language models are few-shot learners. In NeurIPS.
  9. Deep reinforcement learning from human preferences. In NIPS, pages 4299–4307.
  10. Boolq: Exploring the surprising difficulty of natural yes/no questions. In NAACL-HLT (1), pages 2924–2936. Association for Computational Linguistics.
  11. BERT: pre-training of deep bidirectional transformers for language understanding. In NAACL-HLT (1), pages 4171–4186. Association for Computational Linguistics.
  12. William B. Dolan and Chris Brockett. 2005. Automatically constructing a corpus of sentential paraphrases. In IWP@IJCNLP. Asian Federation of Natural Language Processing.
  13. Glam: Efficient scaling of language models with mixture-of-experts. In ICML, volume 162 of Proceedings of Machine Learning Research, pages 5547–5569. PMLR.
  14. Semantic noise matters for neural natural language generation. In INLG, pages 421–426. Association for Computational Linguistics.
  15. Making pre-trained language models better few-shot learners. In ACL/IJCNLP (1), pages 3816–3830. Association for Computational Linguistics.
  16. Twitter sentiment classification using distant supervision. Processing, 150.
  17. Parameter-efficient transfer learning for NLP. In ICML, volume 97 of Proceedings of Machine Learning Research, pages 2790–2799. PMLR.
  18. Lora: Low-rank adaptation of large language models. In ICLR. OpenReview.net.
  19. Language Is Not All You Need: Aligning Perception with Language Models
  20. Dense passage retrieval for open-domain question answering. In EMNLP (1), pages 6769–6781. Association for Computational Linguistics.
  21. Looking beyond the surface: A challenge set for reading comprehension over multiple sentences. In NAACL-HLT, pages 252–262. Association for Computational Linguistics.
  22. Natural questions: a benchmark for question answering research. Trans. Assoc. Comput. Linguistics, 7:452–466.
  23. Towards few-shot fact-checking via perplexity. In NAACL-HLT, pages 1971–1981. Association for Computational Linguistics.
  24. The power of scale for parameter-efficient prompt tuning. In EMNLP (1), pages 3045–3059. Association for Computational Linguistics.
  25. The winograd schema challenge. In KR. AAAI Press.
  26. Xiang Lisa Li and Percy Liang. 2021. Prefix-tuning: Optimizing continuous prompts for generation. In ACL/IJCNLP (1), pages 4582–4597. Association for Computational Linguistics.
  27. Commongen: A constrained text generation challenge for generative commonsense reasoning. In EMNLP (Findings), volume EMNLP 2020 of Findings of ACL, pages 1823–1840. Association for Computational Linguistics.
  28. Truthfulqa: Measuring how models mimic human falsehoods. In ACL (1), pages 3214–3252. Association for Computational Linguistics.
  29. What makes good in-context examples for gpt-3? In DeeLIO@ACL, pages 100–114. Association for Computational Linguistics.
  30. GPT Understands, Too
  31. Pointer sentinel mixture models
  32. Can a suit of armor conduct electricity? A new dataset for open book question answering. In EMNLP, pages 2381–2391. Association for Computational Linguistics.
  33. DART: open-domain structured data record to text generation. In NAACL-HLT, pages 432–447. Association for Computational Linguistics.
  34. Annotated gigaword. In AKBC-WEKEX@NAACL-HLT, pages 95–100. Association for Computational Linguistics.
  35. Altaf Rahman and Vincent Ng. 2012. Resolving complex cases of definite pronouns: The winograd schema challenge. In EMNLP-CoNLL, pages 777–789. ACL.
  36. Know what you don’t know: Unanswerable questions for squad. In ACL (2), pages 784–789. Association for Computational Linguistics.
  37. Squad: 100, 000+ questions for machine comprehension of text. In EMNLP, pages 2383–2392. The Association for Computational Linguistics.
  38. Nils Reimers and Iryna Gurevych. 2019. Sentence-bert: Sentence embeddings using siamese bert-networks. In EMNLP/IJCNLP (1), pages 3980–3990. Association for Computational Linguistics.
  39. Stephen E. Robertson and Hugo Zaragoza. 2009. The probabilistic relevance framework: BM25 and beyond. Found. Trends Inf. Retr., 3(4):333–389.
  40. Choice of plausible alternatives: An evaluation of commonsense causal reasoning. In AAAI Spring Symposium: Logical Formalizations of Commonsense Reasoning. AAAI.
  41. Learning to retrieve prompts for in-context learning. In NAACL-HLT, pages 2655–2671. Association for Computational Linguistics.
  42. Winogrande: An adversarial winograd schema challenge at scale. In AAAI, pages 8732–8740. AAAI Press.
  43. Multitask prompted training enables zero-shot task generalization. In ICLR. OpenReview.net.
  44. BLOOM: A 176B-Parameter Open-Access Multilingual Language Model
  45. Toolformer: Language Models Can Teach Themselves to Use Tools
  46. Recursive deep models for semantic compositionality over a sentiment treebank. In EMNLP, pages 1631–1642. ACL.
  47. One embedder, any task: Instruction-finetuned text embeddings
  48. LaMDA: Language Models for Dialog Applications
  49. The FEVER2.0 shared task. In Proceedings of the Second Workshop on Fact Extraction and VERification (FEVER).
  50. Don’t prompt, search! mining-based zero-shot learning with language models. In EMNLP, pages 7508–7520. Association for Computational Linguistics.
  51. Representation Learning with Contrastive Predictive Coding
  52. GLUE: A multi-task benchmark and analysis platform for natural language understanding. In ICLR (Poster). OpenReview.net.
  53. Finetuned language models are zero-shot learners. In ICLR. OpenReview.net.
  54. Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
  55. A broad-coverage challenge corpus for sentence understanding through inference. In NAACL-HLT, pages 1112–1122. Association for Computational Linguistics.
  56. Compositional exemplars for in-context learning
  57. Efficiently Enhancing Zero-Shot Performance of Instruction Following Model via Retrieval of Soft Prompt
  58. Bitfit: Simple parameter-efficient fine-tuning for transformer-based masked language-models. In ACL (2), pages 1–9. Association for Computational Linguistics.
  59. Hellaswag: Can a machine really finish your sentence? In ACL (1), pages 4791–4800. Association for Computational Linguistics.
  60. Rui Zhang and Joel R. Tetreault. 2019. This email could save your life: Introducing the task of email subject line generation. In ACL (1), pages 446–456. Association for Computational Linguistics.
  61. OPT: Open Pre-trained Transformer Language Models
  62. Character-level convolutional networks for text classification. In NIPS, pages 649–657.
  63. PAWS: paraphrase adversaries from word scrambling. In NAACL-HLT (1), pages 1298–1308. Association for Computational Linguistics.
  64. Multimodal Chain-of-Thought Reasoning in Language Models

Show All 64