Reverse Training to Nurse the Reversal Curse (2403.13799v3)
Abstract: LLMs have a surprising failure: when trained on "A has a feature B", they do not generalize to "B is a feature of A", which is termed the Reversal Curse. Even when training with trillions of tokens this issue still appears due to Zipf's law - hence even if we train on the entire internet. This work proposes an alternative training scheme, called reverse training, whereby all words are used twice, doubling the amount of available tokens. The LLM is trained in both forward and reverse directions by reversing the training strings while preserving (i.e., not reversing) chosen substrings, such as entities. We show that data-matched reverse-trained models provide superior performance to standard models on standard tasks, and compute-matched reverse-trained models provide far superior performance on reversal tasks, helping resolve the reversal curse issue.
- Physics of language models: Part 3.1, knowledge storage and extraction. ArXiv e-prints, abs/2309.14316, September 2023a.
- Physics of Language Models: Part 3.2, Knowledge Manipulation. ArXiv e-prints, abs/2309.14402, September 2023b.
- Efficient training of language models to fill in the middle. arXiv preprint arXiv:2207.14255, 2022.
- Taken out of context: On measuring situational awareness in llms. arXiv preprint arXiv:2309.00667, 2023a.
- The reversal curse: Llms trained on” a is b” fail to learn” b is a”. arXiv preprint arXiv:2309.12288, 2023b.
- Piqa: Reasoning about physical commonsense in natural language. In Proceedings of the AAAI conference on artificial intelligence, volume 34, pp. 7432–7439, 2020.
- Boolq: Exploring the surprising difficulty of natural yes/no questions. arXiv preprint arXiv:1905.10044, 2019.
- Think you have solved question answering? try arc, the ai2 reasoning challenge. arXiv preprint arXiv:1803.05457, 2018.
- Natural language processing (almost) from scratch. Journal of machine learning research, 12(ARTICLE):2493–2537, 2011.
- Language modeling is compression. ArXiv, abs/2309.10668, 2023. URL https://api.semanticscholar.org/CorpusID:262054258.
- Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805, 2018.
- Mitigating reversal curse via semantic-aware permutation training, 2024.
- Measuring massive multitask language understanding. arXiv preprint arXiv:2009.03300, 2020.
- Cross-lingual language model pretraining. arXiv preprint arXiv:1901.07291, 2019.
- Pre-training via paraphrasing. Advances in Neural Information Processing Systems, 33:18470–18481, 2020.
- Self-alignment with instruction backtranslation. ArXiv, abs/2308.06259, 2023. URL https://api.semanticscholar.org/CorpusID:260866107.
- Nltk: The natural language toolkit. arXiv preprint cs/0205028, 2002.
- Are we falling in a middle-intelligence trap? an analysis and mitigation of the reversal curse. arXiv preprint arXiv:2311.07468, 2023.
- Rephrasing the web: A recipe for compute and data-efficient language modeling. arXiv preprint arXiv:2401.16380, 2024.
- Can a suit of armor conduct electricity? a new dataset for open book question answering. arXiv preprint arXiv:1809.02789, 2018.
- Mark EJ Newman. Power laws, pareto distributions and zipf’s law. Contemporary physics, 46(5):323–351, 2005.
- OpenAI. Gpt-4 technical report, 2023.
- Eliciting language model behaviors using reverse language models. In Socially Responsible Language Modelling Research, 2023.
- Winogrande: An adversarial winograd schema challenge at scale. Communications of the ACM, 64(9):99–106, 2021.
- Socialiqa: Commonsense reasoning about social interactions. arXiv preprint arXiv:1904.09728, 2019.
- Flert: Document-level features for named entity recognition, 2020.
- Neural machine translation of rare words with subword units. arXiv preprint arXiv:1508.07909, 2015.
- C. E. Shannon. A mathematical theory of communication. The Bell System Technical Journal, 27(3):379–423, 1948. doi: 10.1002/j.1538-7305.1948.tb01338.x.
- Deepseekmath: Pushing the limits of mathematical reasoning in open language models. ArXiv, abs/2402.03300, 2024. URL https://api.semanticscholar.org/CorpusID:267412607.
- Masked language modeling and the distributional hypothesis: Order word matters pre-training for little. arXiv preprint arXiv:2104.06644, 2021.
- Repetition improves language model embeddings. arXiv preprint arXiv:2402.15449, 2024.
- Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971, 2023a.
- Llama 2: Open foundation and fine-tuned chat models, 2023b.
- Xlnet: Generalized autoregressive pretraining for language understanding. Advances in neural information processing systems, 32, 2019.
- Hellaswag: Can a machine really finish your sentence? arXiv preprint arXiv:1905.07830, 2019.
- Olga Golovneva (17 papers)
- Zeyuan Allen-Zhu (53 papers)
- Jason Weston (130 papers)
- Sainbayar Sukhbaatar (53 papers)