Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 76 tok/s
Gemini 2.5 Pro 55 tok/s Pro
GPT-5 Medium 24 tok/s Pro
GPT-5 High 17 tok/s Pro
GPT-4o 113 tok/s Pro
Kimi K2 188 tok/s Pro
GPT OSS 120B 459 tok/s Pro
Claude Sonnet 4.5 36 tok/s Pro
2000 character limit reached

Understanding Addition in Transformers (2310.13121v9)

Published 19 Oct 2023 in cs.LG and cs.AI

Abstract: Understanding the inner workings of machine learning models like Transformers is vital for their safe and ethical use. This paper provides a comprehensive analysis of a one-layer Transformer model trained to perform n-digit integer addition. Our findings suggest that the model dissects the task into parallel streams dedicated to individual digits, employing varied algorithms tailored to different positions within the digits. Furthermore, we identify a rare scenario characterized by high loss, which we explain. By thoroughly elucidating the model's algorithm, we provide new insights into its functioning. These findings are validated through rigorous testing and mathematical modeling, thereby contributing to the broader fields of model understanding and interpretability. Our approach opens the door for analyzing more complex tasks and multi-layer Transformer models.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (20)
  1. System iii: Learning with domain knowledge for safety constraints, 2023.
  2. Network dissection: Quantifying interpretability of deep visual representations. Proceedings of the IEEE conference on computer vision and pattern recognition, pages 3319–3327, 2017.
  3. Measuring disentanglement: A review of metrics, 2022.
  4. Towards automated circuit discovery for mechanistic interpretability, 04 2023a.
  5. Towards automated circuit discovery for mechanistic interpretability, 2023b.
  6. A mathematical framework for transformer circuits. https://transformer-circuits.pub/2021/framework/index.html, 2021.
  7. Toy models of superposition, 2022.
  8. Neuron to graph: Interpreting language model neurons at scale, 2023.
  9. Transformer feed-forward layers are key-value memories. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 5484–5495, Online and Punta Cana, Dominican Republic, Nov. 2021. Association for Computational Linguistics. doi: 10.18653/v1/2021.emnlp-main.446. URL https://aclanthology.org/2021.emnlp-main.446.
  10. How does information bottleneck help deep learning?, 2023.
  11. Locating and editing factual associations in gpt. https://proceedings.neurips.cc/paper_files/paper/2022/file/6f1d43d5a82a37e89b0665b33bf3a182-Paper-Conference.pdf, 2022.
  12. R. Miikkulainen. Creative ai through evolutionary computation: Principles and examples. https://doi.org/10.1007/s42979-021-00540-9, 2021.
  13. N. Nanda and T. Lieberum. Mechanistic interpretability analysis of grokking. https://www.alignmentforum.org/posts/N6WM6hs7RQMKDhYjB, 2022.
  14. Progress measures for grokking via mechanistic interpretability, 2023.
  15. Zoom in: An introduction to circuits. https://distill.pub/2020/circuits/zoom-in/, 2020a.
  16. Zoom in: An introduction to circuits. Distill, 5(3):e00024.001, 2020b.
  17. Deep symbolic regression: Recovering mathematical expressions from data via risk-seeking policy gradients. Advances in Neural Information Processing Systems, 34, 2021.
  18. Toward transparent ai: A survey on interpreting the inner structures of deep neural networks. arXiv preprint arXiv:2207.13243, 2022.
  19. A. K. Seth. Causal connectivity of evolved neural networks during behavior. Network: Computation in Neural Systems, 16(1):35–54, 2005. doi: 10.1080/09548980500238756. URL https://doi.org/10.1080/09548980500238756.
  20. J. Vig. A multiscale visualization of attention in the transformer model. arXiv preprint arXiv:1906.05714, 2019.
Citations (12)

Summary

We haven't generated a summary for this paper yet.

Lightbulb Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets

This paper has been mentioned in 4 tweets and received 19 likes.

Upgrade to Pro to view all of the tweets about this paper: