Linking In-context Learning in Transformers to Human Episodic Memory (2405.14992v2)
Abstract: Understanding connections between artificial and biological intelligent systems can reveal fundamental principles of general intelligence. While many artificial intelligence models have a neuroscience counterpart, such connections are largely missing in Transformer models and the self-attention mechanism. Here, we examine the relationship between interacting attention heads and human episodic memory. We focus on induction heads, which contribute to in-context learning in Transformer-based LLMs. We demonstrate that induction heads are behaviorally, functionally, and mechanistically similar to the contextual maintenance and retrieval (CMR) model of human episodic memory. Our analyses of LLMs pre-trained on extensive text data show that CMR-like heads often emerge in the intermediate and late layers, qualitatively mirroring human memory biases. The ablation of CMR-like heads suggests their causal role in in-context learning. Our findings uncover a parallel between the computational mechanisms of LLMs and human memory, offering valuable insights into both research fields.
- Performance-optimized hierarchical models predict neural responses in higher visual cortex. Proceedings of the national academy of sciences, 111(23):8619–8624, 2014.
- Using goal-driven deep learning models to understand sensory cortex. Nature neuroscience, 19(3):356–365, 2016.
- Brain-score: Which artificial neural network for object recognition is most brain-like? BioRxiv, page 407007, 2018.
- Emergence of grid-like representations by training recurrent neural networks to perform spatial localization. arXiv preprint arXiv:1803.07770, 2018.
- Vector-based navigation using grid-like representations in artificial agents. Nature, 557(7705):429–433, 2018.
- Prefrontal cortex as a meta-reinforcement learning system. Nature neuroscience, 21(6):860–868, 2018.
- Attentional bias in human category learning: The case of deep learning. Frontiers in Psychology, 9, 2018. URL https://api.semanticscholar.org/CorpusID:4789239.
- Attention is all you need. In Isabelle Guyon, Ulrike von Luxburg, Samy Bengio, Hanna M. Wallach, Rob Fergus, S. V. N. Vishwanathan, and Roman Garnett, editors, Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, December 4-9, 2017, Long Beach, CA, USA, pages 5998–6008, 2017. URL https://proceedings.neurips.cc/paper/2017/hash/3f5ee243547dee91fbd053c1c4a845aa-Abstract.html.
- Alignment of brain embeddings and artificial contextual embeddings in natural language points to common geometric patterns. Nature communications, 15(1):2768, 2024.
- The neural architecture of language: Integrative reverse-engineering converges on a model for predictive processing. BioRxiv, 2020.
- Scaling laws for language encoding models in fmri. Advances in Neural Information Processing Systems, 36, 2024.
- Relating transformers to models and neural representations of the hippocampal formation. arXiv preprint arXiv:2112.04035, 2021.
- In-context learning and induction heads. arXiv preprint arXiv:2209.11895, 2022.
- Language models are unsupervised multitask learners. OpenAI blog, 1(8):9, 2019.
- Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901, 2020.
- Murdock and B Bennet. The serial position effect of free recall. Journal of Experimental Psychology, 64:482–488, 1962.
- A distributed representation of temporal context. Journal of mathematical psychology, 46(3):269–299, 2002.
- Does activation really spread? Psychological Review, 88(5):454–462, 1981. doi: 10.1037/0033-295x.88.5.454.
- A mathematical framework for transformer circuits. Transformer Circuits Thread, 1, 2021.
- Gautam Reddy. The mechanistic basis of data dependence and abrupt learning in an in-context classification task. arXiv preprint arXiv:2312.03002, 2023.
- What needs to go right for an induction head? a mechanistic study of in-context learning circuits and their formation. arXiv preprint arXiv:2404.07129, 2024.
- Neel Nanda. Transformerlens, 2022. URL https://github.com/neelnanda-io/TransformerLens.
- A context maintenance and retrieval model of organizational processes in free recall. Psychological review, 116(1):129, 2009.
- Expanding the scope of memory search: Modeling intralist and interlist effects in free recall. Psychological review, 122:337–363, 04 2015a. doi: 10.1037/a0039036.
- Retrieved-context theory of memory in emotional disorders. bioRxiv, page 817486, 2019.
- The temporal context model in spatial navigation and relational learning: toward a common explanation of medial temporal lobe function across domains. Psychological review, 112 1:75–116, 2005. URL https://api.semanticscholar.org/CorpusID:16459919.
- The successor representation and temporal context. Neural Computation, 24(6):1553–1568, 2012.
- Episodic retrieval for model-based evaluation in sequential decision tasks, 2023. URL https://doi.org/10.31234/osf.io/3sqjh.
- Rethinking the role of scale for in-context learning: An interpretability-based case study at 66 billion scale. In Annual Meeting of the Association for Computational Linguistics, 2022.
- Optimal policies for free recall. Psychological Review, 130(4):1104–1124, 2023. doi: 10.1037/rev0000375.
- A context-based theory of recency and contiguity in free recall. Psychological review, 115 4:893–912, 2008.
- Expanding the scope of memory search: Modeling intralist and interlist effects in free recall. Psychological review, 122(2):337–63, 2015b.
- A predictive framework for evaluating models of semantic organization in free recall. Journal of memory and language, 86:119–140, 2016. doi: 10.1016/j.jml.2015.10.002.
- Lost in the middle: How language models use long contexts. Transactions of the Association for Computational Linguistics, 12:157–173, 2024.
- The hippocampus as a predictive map. Nature neuroscience, 20(11):1643–1653, 2017.
- During running in place, grid cells integrate elapsed time and distance run. Neuron, 88(3):578–589, 2015.
- Building transformers from neurons and astrocytes. Proceedings of the National Academy of Sciences, 120(34):e2219150120, 2023.