Emergent Mind

Abstract

In recent years, pre-trained LLMs have demonstrated remarkable efficiency in achieving an inference-time few-shot learning capability known as in-context learning. However, existing literature has highlighted the sensitivity of this capability to the selection of few-shot demonstrations. Current understandings of the underlying mechanisms by which this capability arises from regular language model pretraining objectives remain disconnected from the real-world LLMs. This study aims to examine the in-context learning phenomenon through a Bayesian lens, viewing real-world LLMs as latent variable models. On this premise, we propose an algorithm to select optimal demonstrations from a set of annotated data with a small LM, and then directly generalize the selected demonstrations to larger LMs. We demonstrate significant improvement over baselines, averaged over eight GPT models on eight real-world text classification datasets. We also demonstrate the real-world usefulness of our algorithm on GSM8K, a math word problem dataset. Our empirical findings support our hypothesis that LLMs implicitly infer a latent variable containing task information.

We're not able to analyze this paper right now due to high demand.

Please check back later (sorry!).

Generate a detailed summary of this paper with a premium account.

We ran into a problem analyzing this paper.

Subscribe by Email

Get summaries of trending comp sci papers delivered straight to your inbox:

Unsubscribe anytime.

References
  1. What learning algorithm is in-context learning? Investigations with linear models
  2. Rethinking the Role of Scale for In-Context Learning: An Interpretability-based Case Study at 66 Billion Scale
  3. Latent dirichlet allocation. J. Mach. Learn. Res., 3(null):993–1022, mar 2003. ISSN 1532-4435.
  4. Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901
  5. Data Distributional Properties Drive Emergent In-Context Learning in Transformers
  6. Semeval-2019 task 3: Emocontext contextual emotion detection in text. In Proceedings of the 13th International Workshop on Semantic Evaluation, pages 39–48, Minneapolis, Minnesota, USA, 2019. Association for Computational Linguistics. doi: 10.18653/v1/S19-2005. https://www.aclweb.org/anthology/S19-2005.

  7. How Many Demonstrations Do You Need for In-context Learning?
  8. Program of Thoughts Prompting: Disentangling Computation from Reasoning for Numerical Reasoning Tasks
  9. Training Verifiers to Solve Math Word Problems
  10. Why Can GPT Learn In-Context? Language Models Implicitly Perform Gradient Descent as Meta-Optimizers
  11. A probabilistic theory of pattern recognition. In Stochastic Modelling and Applied Probability
  12. A Theory of Emergent In-Context Learning as Implicit Structure Induction
  13. Understanding in-context learning via supportive pretraining data. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 12660–12673
  14. A Latent Space Theory for Emergent Abilities in Large Language Models
  15. Evaluating distributional distortion in neural language modeling. In International Conference on Learning Representations
  16. Dbpedia–a large-scale, multilingual knowledge base extracted from wikipedia. Semantic web, 6(2):167–195
  17. The power of scale for parameter-efficient prompt tuning. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 3045–3059
  18. Transformers as Algorithms: Generalization and Stability in In-context Learning
  19. What makes good in-context examples for GPT-3? In Proceedings of Deep Learning Inside Out (DeeLIO 2022): The 3rd Workshop on Knowledge Extraction and Integration for Deep Learning Architectures, pages 100–114, Dublin, Ireland and Online, May 2022. Association for Computational Linguistics. doi: 10.18653/v1/2022.deelio-1.10. https://aclanthology.org/2022.deelio-1.10.

  20. Fantastically ordered prompts and where to find them: Overcoming few-shot prompt order sensitivity. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 8086–8098
  21. Good debt or bad debt: Detecting semantic orientations in economic texts. Journal of the Association for Information Science and Technology, 65
  22. Neural variational inference for text processing. In M. F. Balcan and K. Q. Weinberger, editors, Proceedings of The 33rd International Conference on Machine Learning, volume 48 of Proceedings of Machine Learning Research, pages 1727–1736, New York, New York, USA, 20–22 Jun 2016. PMLR. https://proceedings.mlr.press/v48/miao16.html.

  23. Discovering discrete latent topics with neural variational inference. In International conference on machine learning, pages 2410–2419. PMLR
  24. Noisy channel language model prompting for few-shot text classification. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 5316–5330, 2022a.
  25. MetaICL: Learning to learn in context. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 2791–2809, Seattle, United States, July 2022b. Association for Computational Linguistics. doi: 10.18653/v1/2022.naacl-main.201. https://aclanthology.org/2022.naacl-main.201.

  26. Rethinking the role of demonstrations: What makes in-context learning work? In EMNLP, 2022c
  27. Ethos: an online hate speech detection dataset
  28. Training language models to follow instructions with human feedback
  29. True few-shot learning with language models. In A. Beygelzimer, Y. Dauphin, P. Liang, and J. W. Vaughan, editors, Advances in Neural Information Processing Systems, 2021. https://openreview.net/forum?id=ShnM-rRh4T.

  30. Language models are unsupervised multitask learners. 2019.
  31. Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks
  32. Learning To Retrieve Prompts for In-Context Learning
  33. CARER: Contextualized affect representations for emotion recognition. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 3687–3697, Brussels, Belgium, Oct.-Nov. 2018. Association for Computational Linguistics. doi: 10.18653/v1/D18-1404. https://www.aclweb.org/anthology/D18-1404.

  34. A mathematical exploration of why language models help solve downstream tasks. In International Conference on Learning Representations, 2021. https://openreview.net/forum?id=vVjIW3sEc1s.

  35. Recursive deep models for semantic compositionality over a sentiment treebank. In Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, pages 1631–1642, Seattle, Washington, USA, Oct. 2013. Association for Computational Linguistics. https://aclanthology.org/D13-1170.

  36. Learning to summarize with human feedback. Advances in Neural Information Processing Systems, 33:3008–3021
  37. Selective Annotation Makes Language Models Better Few-Shot Learners
  38. LLaMA: Open and Efficient Foundation Language Models
  39. Llama 2: Open Foundation and Fine-Tuned Chat Models
  40. L. van der Maaten and G. Hinton. Visualizing data using t-sne. Journal of Machine Learning Research, 9(86):2579–2605, 2008. http://jmlr.org/papers/v9/vandermaaten08a.html.

  41. Attention is all you need. Advances in neural information processing systems, 30
  42. Transformers learn in-context by gradient descent
  43. Glue: A multi-task benchmark and analysis platform for natural language understanding. EMNLP 2018, page 353
  44. B. Wang and A. Komatsuzaki. GPT-J-6B: A 6 Billion Parameter Autoregressive Language Model. https://github.com/kingoflolz/mesh-transformer-jax, May 2021.

  45. Self-Consistency Improves Chain of Thought Reasoning in Language Models
  46. Neural Network Acceptability Judgments
  47. Why do pretrained language models help in downstream tasks? an analysis of head and prompt tuning. Neural Information Processing Systems (NeurIPS)
  48. Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
  49. HuggingFace's Transformers: State-of-the-art Natural Language Processing
  50. An explanation of in-context learning as implicit bayesian inference. In International Conference on Learning Representations, 2022. https://openreview.net/forum?id=RdJVFCHjUMI.

  51. OPT: Open Pre-trained Transformer Language Models
  52. Character-level convolutional networks for text classification. Advances in neural information processing systems, 28
  53. Least-to-Most Prompting Enables Complex Reasoning in Large Language Models

Show All 53