Emergent Mind

The Reversal Curse: LLMs trained on "A is B" fail to learn "B is A"

(2309.12288)
Published Sep 21, 2023 in cs.CL , cs.AI , and cs.LG

Abstract

We expose a surprising failure of generalization in auto-regressive LLMs. If a model is trained on a sentence of the form "A is B", it will not automatically generalize to the reverse direction "B is A". This is the Reversal Curse. For instance, if a model is trained on "Valentina Tereshkova was the first woman to travel to space", it will not automatically be able to answer the question, "Who was the first woman to travel to space?". Moreover, the likelihood of the correct answer ("Valentina Tershkova") will not be higher than for a random name. Thus, models do not generalize a prevalent pattern in their training set: if "A is B" occurs, "B is A" is more likely to occur. It is worth noting, however, that if "A is B" appears in-context, models can deduce the reverse relationship. We provide evidence for the Reversal Curse by finetuning GPT-3 and Llama-1 on fictitious statements such as "Uriah Hawthorne is the composer of Abyssal Melodies" and showing that they fail to correctly answer "Who composed Abyssal Melodies?". The Reversal Curse is robust across model sizes and model families and is not alleviated by data augmentation. We also evaluate ChatGPT (GPT-3.5 and GPT-4) on questions about real-world celebrities, such as "Who is Tom Cruise's mother? [A: Mary Lee Pfeiffer]" and the reverse "Who is Mary Lee Pfeiffer's son?". GPT-4 correctly answers questions like the former 79% of the time, compared to 33% for the latter. Code available at: https://github.com/lukasberglund/reversal_curse.

Testing model response accuracy with name-description order variations in finetuning for the Reversal Curse.

Overview

  • The paper identifies a phenomenon called the 'Reversal Curse,' wherein LLMs trained on statements like 'A is B' fail to learn the reverse 'B is A' accurately.

  • Experiments with both fictitious and real-world data show that LLMs such as GPT-3 and Llama-1 exhibit this curse, performing poorly on reversed queries.

  • Efforts to mitigate the Reversal Curse through varied training setups, including data augmentation and altered data formats, proved ineffective.

Overview of "The Reversal Curse: LLMs trained on A is B fail to learn B is A"

The paper "The Reversal Curse: LLMs trained on A is B fail to learn B is A" by Lukas Berglund et al. investigates a fundamental shortcoming in how auto-regressive LLMs generalize patterns from their training data. The authors identify the phenomenon termed the "Reversal Curse," wherein models trained on statements such as "A is B" do not generalize to the reversed form "B is A." This deficiency is examined through a series of experiments and is consistently observed across varied model sizes and types, including GPT-3 and Llama-1.

Key Findings

  1. Reversal Curse Identification: The study establishes that LLMs fail to logically deduce the reverse of learned facts. For instance, if a model is trained with "Olaf Scholz was the ninth Chancellor of Germany," it is unable to infer that "The ninth Chancellor of Germany was Olaf Scholz" with any greater likelihood than a random guess. This indicates a shortfall in the logical symmetry expected from such learning systems.
  2. Experimental Validation: Utilizing both fictitious and real-world data, the researchers confirm the robustness of the Reversal Curse:

    1. Fictitious Data: In one set of experiments, models were fine-tuned on synthetic facts (e.g., "Uriah Hawthorne is the composer of Abyssal Melodies") and tested for their ability to reverse these facts. The results showed that while models responded correctly when asked in the fine-tuned order, they performed no better than random guessing for the reversed queries.
    2. Real-World Data: Further testing with real-world questions about celebrities, such as "Who is Tom Cruise's mother?" and "Who is Mary Lee Pfeiffer's son?", yielded similar outcomes. While GPT-4 answered the former correctly 79% of the time, it could only answer the reversed query correctly 33% of the time.
  3. Ineffectiveness of Data Augmentation: The study also explored whether various training setups could mitigate the Reversal Curse. This involved different hyperparameters, inclusion of auxiliary examples, paraphrases, and altered data formats (e.g., converting statements into question-answer pairs). None of these interventions successfully alleviated the curse.

Implications

The findings have several far-reaching implications:

  • Logical Deduction in LLMs: The Reversal Curse underscores a significant gap in the logical reasoning capabilities of current LLMs. This has profound implications for their reliability and efficacy in applications requiring logical consistency.
  • Meta-Learning: The curse also points to limitations in the meta-learning abilities of LLMs. Despite the prevalence of reversed fact patterns in training data, models fail to adjust their probabilities appropriately, suggesting a fundamental flaw in how these models internalize and generalize information.
  • Model Design and Training Paradigms: The persistent nature of the curse across various models and configurations indicates a need for rethinking model architectures or training paradigms. Methods that enable models to appreciate and utilize the symmetry of logical relations are crucial for advancing the state of these systems.

Future Directions

  1. Further Investigation into Reversal of Relations: The paper suggests exploring whether the Reversal Curse extends to other types of logical relations beyond identity, such as implications or spatial relationships.
  2. Analysis of Training Data: Utilizing entity-linking techniques in pretraining datasets might help identify instances where information only appears in one direction, providing insights into mitigating the problem.
  3. Alternative Learning Models: Non-auto-regressive models or alternative paradigms for knowledge representation and learning might avoid the Reversal Curse, warranting further research in these areas.
  4. Practical Impact Assessment: Investigating the practical effects of the Reversal Curse on real-world deployments of LLMs can guide optimizations in training regimes, particularly for tasks involving sparse data representations.

Conclusion

The Reversal Curse highlights a critical area of improvement for LLMs in terms of logical reasoning and generalized learning. Addressing this issue will be pivotal in enhancing the robustness and reliability of AI systems, particularly as their applications continue to expand into more complex and critical domains. Future research must delve deeper into the underpinnings of this phenomenon and explore innovative approaches to overcome it.

Create an account to read this summary for free:

Newsletter

Get summaries of trending comp sci papers delivered straight to your inbox:

Unsubscribe anytime.

YouTube