The Reversal Curse: LLMs trained on "A is B" fail to learn "B is A" (2309.12288v4)
Abstract: We expose a surprising failure of generalization in auto-regressive LLMs. If a model is trained on a sentence of the form "A is B", it will not automatically generalize to the reverse direction "B is A". This is the Reversal Curse. For instance, if a model is trained on "Valentina Tereshkova was the first woman to travel to space", it will not automatically be able to answer the question, "Who was the first woman to travel to space?". Moreover, the likelihood of the correct answer ("Valentina Tershkova") will not be higher than for a random name. Thus, models do not generalize a prevalent pattern in their training set: if "A is B" occurs, "B is A" is more likely to occur. It is worth noting, however, that if "A is B" appears in-context, models can deduce the reverse relationship. We provide evidence for the Reversal Curse by finetuning GPT-3 and Llama-1 on fictitious statements such as "Uriah Hawthorne is the composer of Abyssal Melodies" and showing that they fail to correctly answer "Who composed Abyssal Melodies?". The Reversal Curse is robust across model sizes and model families and is not alleviated by data augmentation. We also evaluate ChatGPT (GPT-3.5 and GPT-4) on questions about real-world celebrities, such as "Who is Tom Cruise's mother? [A: Mary Lee Pfeiffer]" and the reverse "Who is Mary Lee Pfeiffer's son?". GPT-4 correctly answers questions like the former 79% of the time, compared to 33% for the latter. Code available at: https://github.com/lukasberglund/reversal_curse.
- Taken out of context: On measuring situational awareness in llms, 2023.
- Backward recall and benchmark effects of working memory. Memory & Cognition, 38:279–291, 2010. URL https://api.semanticscholar.org/CorpusID:12393461.
- Language models are few-shot learners. In H. Larochelle, M. Ranzato, R. Hadsell, M.F. Balcan, and H. Lin (eds.), Advances in neural information processing systems, volume 33, pp. 1877–1901. Curran Associates, Inc., 2020. URL https://proceedings.neurips.cc/paper/2020/file/1457c0d6bfcb4967418bfb8ac142f64a-Paper.pdf.
- Data manipulation: Towards effective instance learning for neural dialogue generation via learning to augment and reweight. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp. 6334–6343, Online, July 2020. Association for Computational Linguistics. doi: 10.18653/v1/2020.acl-main.564. URL https://aclanthology.org/2020.acl-main.564.
- Are forward and backward recall the same? a dual-task study of digit recall. Memory & Cognition, 41:519–532, 2013. URL https://api.semanticscholar.org/CorpusID:207716696.
- Editing factual knowledge in language models. arXiv preprint arXiv:2104.08164, 2021.
- Tinystories: How small can language models be and still speak coherent english? arXiv preprint arXiv:2305.07759, 2023.
- Evaluating superhuman models with consistency checks, 2023.
- Transformer feed-forward layers are key-value memories, 2021.
- Transformer feed-forward layers build predictions by promoting concepts in the vocabulary space, 2022.
- Dissecting recall of factual associations in auto-regressive language models, 2023.
- Studying large language model generalization with influence functions, 2023.
- Forward and backward recall: Different visuospatial processes when you know what’s coming. Memory & Cognition, 48:111–126, 2019. URL https://api.semanticscholar.org/CorpusID:198913166.
- Methods for measuring, updating, and visualizing factual beliefs in language models. In Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics, pp. 2714–2731, Dubrovnik, Croatia, May 2023. Association for Computational Linguistics. URL https://aclanthology.org/2023.eacl-main.199.
- Understanding by understanding not: Modeling negation in language models, 2021.
- IMDb. Search imdb: Match all (sorted by popularity ascending). https://www.imdb.com/search/name/?match_all=true&start=1&ref_=rlm, 2023. Accessed: 28 June 2023.
- Large language models struggle to learn long-tail knowledge, 2023.
- Sosuke Kobayashi. Contextual augmentation: Data augmentation by words with paradigmatic relations. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers), pp. 452–457, New Orleans, Louisiana, June 2018. Association for Computational Linguistics. doi: 10.18653/v1/N18-2072. URL https://aclanthology.org/N18-2072.
- Shu Chen Li and Stephan Lewandowsky. Forward and backward recall: Different retrieval processes. Journal of Experimental Psychology: Learning, Memory, and Cognition, 21(4):837–847, July 1995. ISSN 0278-7393.
- Truthfulqa: Measuring how models mimic human falsehoods. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 3214–3252, 2022.
- Locating and editing factual associations in gpt, 2023.
- Fast model editing at scale. arXiv preprint arXiv:2110.11309, 2021.
- OpenAI. Gpt-4 technical report, 2023a.
- OpenAI. Openai api. https://openai.com/api/, 2023b. Accessed: 17 August 2023.
- Language models as knowledge bases? arXiv preprint arXiv:1909.01066, 2019.
- Improving neural machine translation models with monolingual data, 2016.
- Large language models can be easily distracted by irrelevant context, 2023.
- Conceptnet 5.5: An open multilingual graph of general knowledge. In Proceedings of the AAAI conference on artificial intelligence, volume 31, 2017.
- Forward and backward recall. Psychological Science, 14:169 – 174, 2003. URL https://api.semanticscholar.org/CorpusID:30872510.
- Llama: Open and efficient foundation language models, 2023.
- GPT-J-6B: A 6 Billion Parameter Autoregressive Language Model. https://github.com/kingoflolz/mesh-transformer-jax, May 2021.
- Bloom: A 176b-parameter open-access multilingual language model, 2023.
- Kformer: Knowledge injection in transformer feed-forward layers. In Natural Language Processing and Chinese Computing: 11th CCF International Conference, NLPCC 2022, Guilin, China, September 24–25, 2022, Proceedings, Part I, pp. 131–143. Springer, 2022.
- Modifying memories in transformer models. arXiv preprint arXiv:2012.00363, 2020.
- Lukas Berglund (4 papers)
- Meg Tong (8 papers)
- Max Kaufmann (5 papers)
- Mikita Balesni (11 papers)
- Asa Cooper Stickland (15 papers)
- Tomasz Korbak (24 papers)
- Owain Evans (28 papers)