Papers
Topics
Authors
Recent
2000 character limit reached

On the Origins of Linear Representations in Large Language Models (2403.03867v1)

Published 6 Mar 2024 in cs.CL, cs.LG, and stat.ML

Abstract: Recent works have argued that high-level semantic concepts are encoded "linearly" in the representation space of LLMs. In this work, we study the origins of such linear representations. To that end, we introduce a simple latent variable model to abstract and formalize the concept dynamics of the next token prediction. We use this formalism to show that the next token prediction objective (softmax with cross-entropy) and the implicit bias of gradient descent together promote the linear representation of concepts. Experiments show that linear representations emerge when learning from data matching the latent variable model, confirming that this simple structure already suffices to yield linear representations. We additionally confirm some predictions of the theory using the LLaMA-2 LLM, giving evidence that the simplified model yields generalizable insights.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (71)
  1. Carl Allen, Ivana Balazevic and Timothy Hospedales “What the vec? towards probabilistically grounded embeddings” In Advances in neural information processing systems 32, 2019
  2. “Analogies explained: Towards understanding word embeddings” In International Conference on Machine Learning, 2019, pp. 223–231 PMLR
  3. “A Practical Algorithm for Topic Modeling with Provable Guarantees” In Proceedings of the 30th International Conference on Machine Learning, ICML 2013, Atlanta, GA, USA, 16-21 June 2013 28, JMLR Workshop and Conference Proceedings JMLR.org, 2013, pp. 280–288
  4. “Random walks on context spaces: Towards an explanation of the mysteries of semantic word embeddings” In arXiv preprint arXiv:1502.03520, 2015, pp. 385–399
  5. “A latent variable model approach to pmi-based word embeddings” In Transactions of the Association for Computational Linguistics 4 MIT Press One Rogers Street, Cambridge MA 02142-1209, USA journals-info …, 2016, pp. 385–399
  6. “Linear algebraic structure of word senses, with applications to polysemy” In Transactions of the Association for Computational Linguistics 6, 2018, pp. 483–495
  7. “Network dissection: Quantifying interpretability of deep visual representations” In Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 6541–6549
  8. David M Blei and John D Lafferty “Dynamic topic models” In Proceedings of the 23rd international conference on Machine learning, 2006, pp. 113–120
  9. “Learning Linear Causal Representations from Interventions under General Nonlinear Mixing” In arXiv preprint arXiv:2306.02235, 2023
  10. “Discovering latent knowledge in language models without supervision” In arXiv preprint arXiv:2212.03827, 2022
  11. Tyler A Chang, Zhuowen Tu and Benjamin K Bergen “The geometry of multilingual language model representations” In arXiv preprint arXiv:2205.10964, 2022
  12. “Probing BERT in hyperbolic spaces” In arXiv preprint arXiv:2104.03869, 2021
  13. Sanjoy Dasgupta “Learning mixtures of Gaussians” In 40th Annual Symposium on Foundations of Computer Science (Cat. No. 99CB37039), 1999, pp. 634–644 IEEE
  14. “Toy models of superposition” In arXiv preprint arXiv:2209.10652, 2022
  15. Jesse Engel, Matthew Hoffman and Adam Roberts “Latent constraints: Learning to generate conditionally from unconditional generative models” In arXiv preprint arXiv:1711.05772, 2017
  16. Kawin Ethayarajh, David Duvenaud and Graeme Hirst “Towards understanding linear word analogies” In arXiv preprint arXiv:1810.04882, 2018
  17. “Multi-facet clustering variational autoencoders” In Advances in Neural Information Processing Systems 34, 2021
  18. “Understanding composition of word embeddings via tensor decomposition” In arXiv preprint arXiv:1902.00613, 2019
  19. Alex Gittens, Dimitris Achlioptas and Michael W. Mahoney “Skip-Gram - Zipf + Uniform = Vector Additivity” In Annual Meeting of the Association for Computational Linguistics, 2017
  20. Alex Gittens, Dimitris Achlioptas and Michael W Mahoney “Skip-gram- zipf+ uniform= vector additivity” In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2017, pp. 69–76
  21. Anna Gladkova, Aleksandr Drozd and Satoshi Matsuoka “Analogy-based detection of morphological and semantic relations with word embeddings: what works and what doesn’t.” In Proceedings of the NAACL Student Research Workshop, 2016, pp. 8–15
  22. “Finding Neurons in a Haystack: Case Studies with Sparse Probing” In arXiv preprint arXiv:2305.01610, 2023
  23. Aapo Hyvärinen, Ilyes Khemakhem and Ricardo Monti “Identifiability of latent-variable and structural-equation models: from linear to nonlinear” In arXiv preprint arXiv:2302.02672, 2023
  24. “Learning Latent Causal Graphs with Unknown Interventions” In Advances in Neural Information Processing Systems, 2023
  25. Yibo Jiang, Bryon Aragam and Victor Veitch “Uncovering meanings of embeddings via partial orthogonality” In arXiv preprint arXiv:2310.17611, 2023
  26. “Variational autoencoders and nonlinear ica: A unifying framework” In International Conference on Artificial Intelligence and Statistics, 2020, pp. 2207–2217 PMLR
  27. “Interpretability beyond feature attribution: Quantitative testing with concept activation vectors (tcav)” In International conference on machine learning, 2018, pp. 2668–2677 PMLR
  28. Diederik P. Kingma and Jimmy Ba “Adam: A Method for Stochastic Optimization” In 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings, 2015
  29. “Learning latent causal graphs via mixture oracles” In Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, virtual, 2021, pp. 18087–18101
  30. “Identifiability of deep generative models without auxiliary information” In Advances in Neural Information Processing Systems 35, 2022, pp. 15687–15701
  31. “Probabilistic graphical models: principles and techniques” MIT press, 2009
  32. Harold Kushner and G George Yin “Stochastic approximation and recursive algorithms and applications” Springer Science & Business Media, 2003
  33. “Disentanglement via Mechanism Sparsity Regularization: A New Principle for Nonlinear ICA” In 1st Conference on Causal Learning and Reasoning, CLeaR 2022, Sequoia Conference Center, Eureka, CA, USA, 11-13 April, 2022 177, Proceedings of Machine Learning Research PMLR, 2022, pp. 428–484
  34. Hector Levesque, Ernest Davis and Leora Morgenstern “The winograd schema challenge” In Thirteenth International Conference on the Principles of Knowledge Representation and Reasoning, 2012 Citeseer
  35. “On the sentence embeddings from pre-trained language models” In arXiv preprint arXiv:2011.05864, 2020
  36. “Inference-Time Intervention: Eliciting Truthful Answers from a Language Model” In arXiv preprint arXiv:2306.03341, 2023
  37. “Mind the gap: Understanding the modality gap in multi-modal contrastive representation learning” In Advances in Neural Information Processing Systems 35, 2022, pp. 17612–17625
  38. “Acquisition of chess knowledge in alphazero” In Proceedings of the National Academy of Sciences 119.47 National Acad Sciences, 2022, pp. e2206625119
  39. Tomáš Mikolov, Wen-tau Yih and Geoffrey Zweig “Linguistic regularities in continuous space word representations” In Proceedings of the 2013 conference of the north american chapter of the association for computational linguistics: Human language technologies, 2013, pp. 746–751
  40. “The strange geometry of skip-gram with negative sampling” In Conference on Empirical Methods in Natural Language Processing, 2017
  41. “Relative representations enable zero-shot latent space communication” In arXiv preprint arXiv:2209.15430, 2022
  42. Neel Nanda, Andrew Lee and Martin Wattenberg “Emergent Linear Representations in World Models of Self-Supervised Sequence Models” In arXiv preprint arXiv:2309.00941, 2023
  43. OpenAI “GPT-4 Technical Report”, 2023 arXiv:2303.08774 [cs.CL]
  44. Kiho Park, Yo Joong Choe and Victor Veitch “The Linear Representation Hypothesis and the Geometry of Large Language Models”, 2023 arXiv:2311.03658 [cs.CL]
  45. Judea Pearl “Causality” Cambridge university press, 2009
  46. Jeffrey Pennington, Richard Socher and Christopher D Manning “Glove: Global vectors for word representation” In Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), 2014, pp. 1532–1543
  47. Alec Radford, Luke Metz and Soumith Chintala “Unsupervised representation learning with deep convolutional generative adversarial networks” In arXiv preprint arXiv:1511.06434, 2015
  48. “Svcca: Singular vector canonical correlation analysis for deep understanding and improvement” In stat 1050, 2017, pp. 19
  49. “Learning Interpretable Concepts: Unifying Causal Representation Learning and Foundation Models” In arXiv preprint, 2024
  50. “Structure learning in polynomial time: Greedy algorithms, Bregman information, and exponential families” In Advances in Neural Information Processing Systems 34, 2021, pp. 18660–18672
  51. “An Interventional Perspective on Identifiability in Gaussian LTI Systems with Independent Component Analysis” In arXiv preprint arXiv:2311.18048, 2023
  52. “Visualizing and measuring the geometry of BERT” In Advances in Neural Information Processing Systems 32, 2019
  53. Narutatsu Ri, Fei-Tzin Lee and Nakul Verma “Contrastive Loss is All You Need to Recover Analogies as Parallel Lines” In arXiv preprint arXiv:2306.08221, 2023
  54. “Dynamic bernoulli embeddings for language evolution” In arXiv preprint arXiv:1703.08052, 2017
  55. “Exponential family embeddings” In Advances in Neural Information Processing Systems 29, 2016
  56. “From statistical to causal learning” In arXiv preprint arXiv:2204.00607, 2022
  57. “Toward causal representation learning” arXiv:2102.11107 In Proceedings of the IEEE 109.5 IEEE, 2021, pp. 612–634
  58. “Bridging the Human-AI Knowledge Gap: Concept Discovery and Transfer in AlphaZero” In arXiv preprint arXiv:2310.16410, 2023
  59. “The implicit bias of gradient descent on separable data” In The Journal of Machine Learning Research 19.1 JMLR. org, 2018, pp. 2822–2878
  60. Peter Spirtes, Clark N Glymour and Richard Scheines “Causation, prediction, and search” MIT press, 2000
  61. “Causal structure learning: a combinatorial perspective” In Foundations of Computational Mathematics Springer, 2022, pp. 1–35
  62. Jörg Tiedemann “Parallel Data, Tools and Interfaces in OPUS” In Proceedings of the Eight International Conference on Language Resources and Evaluation (LREC’12) Istanbul, Turkey: European Language Resources Association (ELRA), 2012
  63. “Linear Representations of Sentiment in Large Language Models” In arXiv preprint arXiv:2310.15154, 2023
  64. “Llama: Open and efficient foundation language models” In arXiv preprint arXiv:2302.13971, 2023
  65. “Linear spaces of meanings: compositional structures in vision-language models” In Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023, pp. 15395–15404
  66. “Score-based Causal Representation Learning with Interventions” In arXiv preprint arXiv:2301.08230, 2023
  67. “Evaluating Natural Alpha Embeddings on Intrinsic and Extrinsic Tasks” In Workshop on Representation Learning for NLP, 2020
  68. “Natural alpha embeddings” In Information Geometry 4.1 Springer, 2021, pp. 3–29
  69. “Concept Algebra for Score-based Conditional Model” In ICML 2023 Workshop on Structured Probabilistic Inference {normal-{\{{\normal-\\backslash\&}normal-}\}} Generative Modeling, 2023
  70. “Direction matters: On the implicit bias of stochastic gradient descent with moderate learning rate” In arXiv preprint arXiv:2011.02538, 2020
  71. “Contrastive learning inverts the data generating process” In International Conference on Machine Learning, 2021, pp. 12979–12990 PMLR
Citations (18)

Summary

  • The paper establishes that linear representations in LLMs arise from log-odds matching and the implicit bias of gradient descent during next-token prediction.
  • It introduces a latent variable model that encodes human-interpretable concepts using a probabilistic Markov random field for context mapping.
  • Empirical validations show that gradient descent aligns similar concept vectors while maintaining orthogonal separations, enhancing model interpretability.

On the Origins of Linear Representations in LLMs

Introduction

The paper "On the Origins of Linear Representations in LLMs" (2403.03867) explores the phenomenon where high-level semantic concepts are encoded linearly within the vector space of LLMs. This linearity, observed empirically in models like LLaMA-2, raises fundamental questions about how such representations originate during the training process. The authors introduce a novel mathematical framework to explore the dynamics of next-token prediction, proposing that both the objective functions used in training and the implicit biases in gradient descent contribute to this linear structuring.

Latent Variable Model for Concept Dynamics

The study introduces a latent variable model where context sentences and next tokens reflect latent binary concept variables. This model encapsulates human-interpretable concepts within the latent space, allowing for an analytical exploration of how concepts are represented in embeddings and unembeddings.

Latent Space and Mapping: The model defines a set of binary variables representing different concepts, structured as a Markov random field. Each context maps to these concept variables, introducing a probabilistic component to next-token prediction. The injective mapping from latent space to token space ensures each concept combination is uniquely represented in the output space.

Linearity through Log-Odds and Gradient Descent

The research establishes that linear representations stem from two primary sources:

Log-Odds Matching: The paper initially demonstrates that when the log-odds of learned probabilities across concepts match, a linear structure inherently arises in subspaces. This finding is backed by prior evidence from studies on word embeddings.

Implicit Bias of Gradient Descent: Extending beyond log-odds, the authors link the linear encoding of representations to the biases induced by gradient descent. Through theoretical underpinnings, they illustrate how gradient descent inherently promotes the alignment and linearity of concept vectors over successive training iterations. Figure 1

Figure 1: Unembedding steering vectors of the same concept in LLaMA-2 have nontrivial alignment, but steering vectors of different concepts are represented almost orthogonally.

Orthogonality and Semantic Structures

The study further examines how unrelated concepts, those not linked in the Markov random field, exhibit orthogonal representations. This orthogonality, unaccounted for by standard training objectives, emerges as a byproduct of the learned Euclidean geometry in LLMs, which inadvertently respects semantic distances.

Euclidean Geometry: The implicit bias of gradient descent not only causes concept alignment within representations but also ensures that disparate concepts maintain orthogonal separations, refining how semantics are geometrically encoded.

Empirical Validation and Theoretical Insights

Extensive experiments using simulated data from the proposed latent model confirm the emergence of linear and orthogonal representations. These results are consistent even when varying the set of contexts or latent dimensions, highlighting the robustness of the theoretical claims.

In practical scenarios with LLMs like LLaMA-2, empirical observations support the presence of these geometric structures. Notably, the context and concept vectors align as predicted, substantiating the theoretical models.

Conclusion

This research provides a comprehensive explanation for the linearity observed in LLM representations, attributing it to systemic biases in training processes. The introduction of a latent variable model offers a powerful framework for interpreting such phenomena, with implications extending to improving model interpretability and potentially guiding future AI developments. Through its synthesis of theory and empirical validation, the study enhances our understanding of the geometric principles governing large-scale AI models.

Whiteboard

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 8 tweets with 347 likes about this paper.