Arithmetic with Language Models: from Memorization to Computation (2308.01154v4)

Published 2 Aug 2023 in cs.AI and cs.CL

Abstract: A better understanding of the emergent computation and problem-solving capabilities of recent LLMs is of paramount importance to further improve them and broaden their applicability. This work investigates how a LLM, trained to predict the next token, can perform arithmetic computations generalizing beyond training data. Binary addition and multiplication constitute a good testbed for this purpose, since they require a very small vocabulary and exhibit relevant input/output discontinuities making smooth input interpolation ineffective for novel data. We successfully trained a light LLM to learn these tasks and ran a number of experiments to investigate the extrapolation capabilities and internal information processing. Our findings support the hypothesis that the LLM works as an Encoding-Regression-Decoding machine where the computation takes place in the value space once the input token representation is mapped to an appropriate internal representation.

Citations (4)

View on Semantic Scholar

Summary

We haven't generated a summary for this paper yet.

Summarize Now

Tweets

https://twitter.com/davidad/status/1845912636493287756

https://twitter.com/almostlikethat/status/1782500292828790804

Arithmetic with Language Models: from Memorization to Computation (2308.01154v4)

Summary

Related Papers

Tweets