The Reversal Curse: LLMs trained on "A is B" fail to learn "B is A" (2309.12288v4)

Published 21 Sep 2023 in cs.CL, cs.AI, and cs.LG

Abstract: We expose a surprising failure of generalization in auto-regressive LLMs. If a model is trained on a sentence of the form "A is B", it will not automatically generalize to the reverse direction "B is A". This is the Reversal Curse. For instance, if a model is trained on "Valentina Tereshkova was the first woman to travel to space", it will not automatically be able to answer the question, "Who was the first woman to travel to space?". Moreover, the likelihood of the correct answer ("Valentina Tershkova") will not be higher than for a random name. Thus, models do not generalize a prevalent pattern in their training set: if "A is B" occurs, "B is A" is more likely to occur. It is worth noting, however, that if "A is B" appears in-context, models can deduce the reverse relationship. We provide evidence for the Reversal Curse by finetuning GPT-3 and Llama-1 on fictitious statements such as "Uriah Hawthorne is the composer of Abyssal Melodies" and showing that they fail to correctly answer "Who composed Abyssal Melodies?". The Reversal Curse is robust across model sizes and model families and is not alleviated by data augmentation. We also evaluate ChatGPT (GPT-3.5 and GPT-4) on questions about real-world celebrities, such as "Who is Tom Cruise's mother? [A: Mary Lee Pfeiffer]" and the reverse "Who is Mary Lee Pfeiffer's son?". GPT-4 correctly answers questions like the former 79% of the time, compared to 33% for the latter. Code available at: https://github.com/lukasberglund/reversal_curse.

References (34)

Authors (7)

Lukas Berglund (4 papers)
Meg Tong (8 papers)
Max Kaufmann (5 papers)
Mikita Balesni (11 papers)
Asa Cooper Stickland (15 papers)
Tomasz Korbak (24 papers)
Owain Evans (28 papers)

Citations (185)

View on Semantic Scholar

Summary

The paper demonstrates that LLMs trained on 'A is B' fail to deduce 'B is A', evidenced by a drop from 79% to 33% accuracy in real-world tests.
Experimental results using both synthetic and real-world data show that standard data augmentation and training tweaks do not overcome the reversal failure.
The findings stress the need for revised training paradigms to enhance logical deduction and overcome inherent limitations in current LLM architectures.

Overview of "The Reversal Curse: LLMs trained on `A is B` fail to learn `B is A`"

The paper "The Reversal Curse: LLMs trained on A is B fail to learn B is A" by Lukas Berglund et al. investigates a fundamental shortcoming in how auto-regressive LLMs generalize patterns from their training data. The authors identify the phenomenon termed the "Reversal Curse," wherein models trained on statements such as "A is B" do not generalize to the reversed form "B is A." This deficiency is examined through a series of experiments and is consistently observed across varied model sizes and types, including GPT-3 and Llama-1.

Key Findings

Reversal Curse Identification: The paper establishes that LLMs fail to logically deduce the reverse of learned facts. For instance, if a model is trained with "Olaf Scholz was the ninth Chancellor of Germany," it is unable to infer that "The ninth Chancellor of Germany was Olaf Scholz" with any greater likelihood than a random guess. This indicates a shortfall in the logical symmetry expected from such learning systems.
Experimental Validation: Utilizing both fictitious and real-world data, the researchers confirm the robustness of the Reversal Curse:
1. Fictitious Data: In one set of experiments, models were fine-tuned on synthetic facts (e.g., "Uriah Hawthorne is the composer of Abyssal Melodies") and tested for their ability to reverse these facts. The results showed that while models responded correctly when asked in the fine-tuned order, they performed no better than random guessing for the reversed queries.
2. Real-World Data: Further testing with real-world questions about celebrities, such as "Who is Tom Cruise's mother?" and "Who is Mary Lee Pfeiffer's son?", yielded similar outcomes. While GPT-4 answered the former correctly 79% of the time, it could only answer the reversed query correctly 33% of the time.
Ineffectiveness of Data Augmentation: The paper also explored whether various training setups could mitigate the Reversal Curse. This involved different hyperparameters, inclusion of auxiliary examples, paraphrases, and altered data formats (e.g., converting statements into question-answer pairs). None of these interventions successfully alleviated the curse.

Implications

The findings have several far-reaching implications:

Logical Deduction in LLMs: The Reversal Curse underscores a significant gap in the logical reasoning capabilities of current LLMs. This has profound implications for their reliability and efficacy in applications requiring logical consistency.
Meta-Learning: The curse also points to limitations in the meta-learning abilities of LLMs. Despite the prevalence of reversed fact patterns in training data, models fail to adjust their probabilities appropriately, suggesting a fundamental flaw in how these models internalize and generalize information.
Model Design and Training Paradigms: The persistent nature of the curse across various models and configurations indicates a need for rethinking model architectures or training paradigms. Methods that enable models to appreciate and utilize the symmetry of logical relations are crucial for advancing the state of these systems.

Future Directions

Further Investigation into Reversal of Relations: The paper suggests exploring whether the Reversal Curse extends to other types of logical relations beyond identity, such as implications or spatial relationships.
Analysis of Training Data: Utilizing entity-linking techniques in pretraining datasets might help identify instances where information only appears in one direction, providing insights into mitigating the problem.
Alternative Learning Models: Non-auto-regressive models or alternative paradigms for knowledge representation and learning might avoid the Reversal Curse, warranting further research in these areas.
Practical Impact Assessment: Investigating the practical effects of the Reversal Curse on real-world deployments of LLMs can guide optimizations in training regimes, particularly for tasks involving sparse data representations.

Conclusion

The Reversal Curse highlights a critical area of improvement for LLMs in terms of logical reasoning and generalized learning. Addressing this issue will be pivotal in enhancing the robustness and reliability of AI systems, particularly as their applications continue to expand into more complex and critical domains. Future research must explore the underpinnings of this phenomenon and explore innovative approaches to overcome it.

PDF Markdown

Related Papers

GitHub

GitHub - lukasberglund/reversal_curse (288 stars)

Tweets

https://twitter.com/ChombaBupe/status/1745566339022958874

https://twitter.com/aryaman2020/status/1888172276937568559

https://twitter.com/sebkrier/status/1808634166143013146

https://twitter.com/scychan_brains/status/1804205442685673515

https://twitter.com/Matt_heyqq/status/1784253927292064083

https://twitter.com/abacaj/status/1752470930712920096

YouTube

Show All Videos