Papers
Topics
Authors
Recent
Detailed Answer
Quick Answer
Concise responses based on abstracts only
Detailed Answer
Well-researched responses based on abstracts and relevant paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses
Gemini 2.5 Flash
Gemini 2.5 Flash 27 tok/s
Gemini 2.5 Pro 46 tok/s Pro
GPT-5 Medium 23 tok/s Pro
GPT-5 High 29 tok/s Pro
GPT-4o 70 tok/s Pro
Kimi K2 117 tok/s Pro
GPT OSS 120B 459 tok/s Pro
Claude Sonnet 4 34 tok/s Pro
2000 character limit reached

Linear Latent World Models in Simple Transformers: A Case Study on Othello-GPT (2310.07582v2)

Published 11 Oct 2023 in cs.LG and cs.AI

Abstract: Foundation models exhibit significant capabilities in decision-making and logical deductions. Nonetheless, a continuing discourse persists regarding their genuine understanding of the world as opposed to mere stochastic mimicry. This paper meticulously examines a simple transformer trained for Othello, extending prior research to enhance comprehension of the emergent world model of Othello-GPT. The investigation reveals that Othello-GPT encapsulates a linear representation of opposing pieces, a factor that causally steers its decision-making process. This paper further elucidates the interplay between the linear world representation and causal decision-making, and their dependence on layer depth and model complexity. We have made the code public.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (14)
  1. Guillaume Alain and Yoshua Bengio. 2018. Understanding intermediate layers using linear classifier probes. arXiv:1610.01644 [stat.ML]
  2. Deep ViT Features as Dense Visual Descriptors. CoRR abs/2112.05814 (2021). arXiv:2112.05814
  3. Yonatan Belinkov. 2021. Probing Classifiers: Promises, Shortcomings, and Advances. arXiv:2102.12452 [cs.CL]
  4. On the dangers of stochastic parrots: Can language models be too big?. In Proceedings of the 2021 ACM conference on fairness, accountability, and transparency. 610–623.
  5. What Does BERT Look at? An Analysis of BERT’s Attention. In Proceedings of the 2019 ACL Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP. Association for Computational Linguistics, Florence, Italy, 276–286. https://doi.org/10.18653/v1/W19-4828
  6. Selection-inference: Exploiting large language models for interpretable logical reasoning. arXiv preprint arXiv:2205.09712 (2022).
  7. Visualisation and’diagnostic classifiers’ reveal how recurrent and recursive neural networks process hierarchical structure. Journal of Artificial Intelligence Research 61 (2018), 907–926.
  8. Emergent World Representations: Exploring a Sequence Model Trained on a Synthetic Task. In The Eleventh International Conference on Learning Representations. https://openreview.net/forum?id=DeG07_TcZvT
  9. Neel Nanda. 2023. Actually, Othello-GPT Has A Linear Emergent World Model. <https://neelnanda.io/mechanistic-interpretability/othello>
  10. Zoom in: An introduction to circuits. Distill 5, 3 (2020), e00024–001.
  11. Feature Visualization. Distill (2017). https://doi.org/10.23915/distill.00007 https://distill.pub/2017/feature-visualization.
  12. In-context Learning and Induction Heads. arXiv:2209.11895 [cs.LG]
  13. Beyond the imitation game: Quantifying and extrapolating the capabilities of language models. arXiv preprint arXiv:2206.04615 (2022).
  14. Learning Chess Blindfolded: Evaluating Language Models on State Tracking. CoRR abs/2102.13249 (2021). arXiv:2102.13249
Citations (5)

Summary

We haven't generated a summary for this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

Dice Question Streamline Icon: https://streamlinehq.com

Follow-Up Questions

We haven't generated follow-up questions for this paper yet.

Youtube Logo Streamline Icon: https://streamlinehq.com