An Incomplete Loop: Instruction Inference, Instruction Following, and In-context Learning in Language Models (2404.03028v3)
Abstract: Modern LLMs (LMs) can learn to perform new tasks in different ways: in instruction following, the target task is described explicitly in natural language; in few-shot prompting, the task is specified implicitly with a small number of examples; in instruction inference, LMs are presented with in-context examples and are then prompted to generate a natural language task description before making predictions. Each of these procedures may be thought of as invoking a different form of reasoning: instruction following involves deductive reasoning, few-shot prompting involves inductive reasoning, and instruction inference involves abductive reasoning. How do these different capabilities relate? Across four LMs (from the gpt and llama families) and two learning problems (involving arithmetic functions and machine translation) we find a strong dissociation between the different types of reasoning: LMs can sometimes learn effectively from few-shot prompts even when they are unable to explain their own prediction rules; conversely, they sometimes infer useful task descriptions while completely failing to learn from human-generated descriptions of the same task. Our results highlight the non-systematic nature of reasoning even in some of today's largest LMs, and underscore the fact that very different learning mechanisms may be invoked by seemingly similar prompting procedures.
- What learning algorithm is in-context learning? investigations with linear models, 2023.
- Learning with latent language. In Proceedings of the Annual Meeting of the North American Chapter of the Association for Computational Linguistics, 2018.
- Controlling the false discovery rate: a practical and powerful approach to multiple testing. Journal of the Royal Statistical Society. Series B (Methodological), 57(1):289–300, 1995.
- Abductive commonsense reasoning, 2020.
- Language models are few-shot learners, 2020.
- Igor Douven. Abduction. In Edward N. Zalta (ed.), The Stanford Encyclopedia of Philosophy. Metaphysics Research Lab, Stanford University, Summer 2021 edition, 2021.
- Matthew S. Dryer and Martin Haspelmath (eds.). WALS Online (v2020.3). Zenodo, 2013. doi: 10.5281/zenodo.7385533. URL https://doi.org/10.5281/zenodo.7385533.
- Dreamcoder: Growing generalizable, interpretable knowledge with wake-sleep bayesian program learning, 2020.
- H. Frankfurt. Peirce’s notion of abduction. Journal of Philosophy, 55:593–596, 1958.
- What can transformers learn in-context? a case study of simple function classes, 2023.
- James Hawthorne. Inductive Logic. In Edward N. Zalta (ed.), The Stanford Encyclopedia of Philosophy. Metaphysics Research Lab, Stanford University, Spring 2021 edition, 2021.
- Instruction induction: From few examples to natural language task descriptions, 2022.
- Large language models are zero-shot reasoners. Advances in neural information processing systems, 35:22199–22213, 2022.
- Generalization without systematicity: On the compositional skills of sequence-to-sequence recurrent networks. In International Conference on Machine Learning, 2017. URL https://api.semanticscholar.org/CorpusID:46761158.
- Human few-shot learning of compositional instructions. In Annual Meeting of the Cognitive Science Society, 2019. URL https://api.semanticscholar.org/CorpusID:58006558.
- Self-alignment with instruction backtranslation, 2024.
- Peter Lipton. Inference to the Best Explanation. Routledge, 2001.
- Gpt-4 technical report, 2024.
- Karl Pearson. Notes on regression and inheritance in the case of two parents. Proceedings of the Royal Society of London, 58:240–242, 1895.
- Charles Sanders Peirce. Collected Papers of Charles Sanders Peirce, Volume 5, volume 5. Harvard University Press, 1965. URL http://www.hup.harvard.edu/catalog.php?isbn=9780674138001.
- Maja Popović. chrF: character n-gram F-score for automatic MT evaluation. In Ondřej Bojar, Rajan Chatterjee, Christian Federmann, Barry Haddow, Chris Hokamp, Matthias Huck, Varvara Logacheva, and Pavel Pecina (eds.), Proceedings of the Tenth Workshop on Statistical Machine Translation, pp. 392–395, Lisbon, Portugal, September 2015. Association for Computational Linguistics. doi: 10.18653/v1/W15-3049. URL https://aclanthology.org/W15-3049.
- Phenomenal yet puzzling: Testing inductive reasoning capabilities of language models with hypothesis refinement, 2023.
- Multitask prompted training enables zero-shot task generalization. arXiv preprint arXiv:2110.08207, 2021.
- Classical Logic. In Edward N. Zalta and Uri Nodelman (eds.), The Stanford Encyclopedia of Philosophy. Metaphysics Research Lab, Stanford University, Spring 2024 edition, 2024.
- Clutrr: A diagnostic benchmark for inductive reasoning from text, 2019.
- Grambank reveals global patterns in the structural diversity of the world’s languages. Science Advances, 9, 2023. doi: 10.1126/sciadv.adg6175.
- Charles Spearman. The proof and measurement of association between two things. American Journal of Psychology, 15:72–101, 1904.
- A benchmark for learning to translate a new language from one grammar book, 2024.
- Llama 2: Open foundation and fine-tuned chat models, 2023.
- Eline Visser. Kalamang dictionary. Dictionaria, (13):1–2737, 2020. URL https://dictionaria.clld.org/contributions/kalamang.
- Does it make sense? and why? a pilot study for sense making and explanation. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pp. 4020–4026, Florence, Italy, July 2019. Association for Computational Linguistics. doi: 10.18653/v1/P19-1393. URL https://aclanthology.org/P19-1393.
- Hypothesis search: Inductive reasoning with language models, 2023.
- Finetuned language models are zero-shot learners. arXiv preprint arXiv:2109.01652, 2021.
- Chain-of-thought prompting elicits reasoning in large language models, 2023.
- Language models as inductive reasoners, 2024.
- Winowhy: A deep diagnosis of essential commonsense knowledge for answering winograd schema challenge, 2020.
- Abductive commonsense reasoning exploiting mutually exclusive explanations. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 14883–14896, Toronto, Canada, July 2023. Association for Computational Linguistics. doi: 10.18653/v1/2023.acl-long.831. URL https://aclanthology.org/2023.acl-long.831.
- Describing differences between text distributions with natural language, 2022.
- Goal driven discovery of distributional differences via language descriptions, 2023.
- Large language models can learn rules, 2023.