Emergent Mind

Transformers Can Do Arithmetic with the Right Embeddings

(2405.17399)
Published May 27, 2024 in cs.LG and cs.AI

Abstract

The poor performance of transformers on arithmetic tasks seems to stem in large part from their inability to keep track of the exact position of each digit inside of a large span of digits. We mend this problem by adding an embedding to each digit that encodes its position relative to the start of the number. In addition to the boost these embeddings provide on their own, we show that this fix enables architectural modifications such as input injection and recurrent layers to improve performance even further. With positions resolved, we can study the logical extrapolation ability of transformers. Can they solve arithmetic problems that are larger and more complex than those in their training data? We find that training on only 20 digit numbers with a single GPU for one day, we can reach state-of-the-art performance, achieving up to 99% accuracy on 100 digit addition problems. Finally, we show that these gains in numeracy also unlock improvements on other multi-step reasoning tasks including sorting and multiplication.

Accuracy of looped transformer models on multiplication, showing improvements with Abacus Embeddings and FIRE.

Overview

  • The paper introduces Abacus Embeddings, a novel positional embedding technique for transformer models that enhances their performance on arithmetic tasks by improving positional representations of digits.

  • Architectural enhancements such as input injection and the integration of recurrent layers are implemented alongside Abacus Embeddings, leading to significant boosts in model accuracy and generalization across arithmetic operations like addition and multiplication.

  • The enhancements extend beyond arithmetic to other algorithmic reasoning tasks, such as sorting, and suggest potential for broader applications and future research into combined embedding strategies and further refined positional embeddings.

Enhanced Arithmetic Capabilities in Transformers through Abacus Embeddings and Recurrence

The paper investigates the inherent challenges faced by transformer models, particularly in the context of arithmetic tasks, and proposes a solution to address these deficits. The primary contribution involves introducing Abacus Embeddings, which significantly improve positional representations of digits, and integrating recurrent layers to enhance the transformer’s reasoning capabilities.

Core Contributions and Methodologies

The authors identify that transformers struggle with arithmetic due to their difficulty in maintaining exact positional information of digits within sequences. To remedy this, they propose Abacus Embeddings, a novel positional embedding technique that encodes digit positions relative to the start of their respective numbers. This approach diverges from traditional positional embeddings by providing identical embeddings for digits of the same significance, hence preserving the positional hierarchy required for arithmetic operations.

Key Insights and Numerical Results

Abacus Embeddings:

  • These embeddings significantly boost transformer performance on arithmetic. For example, models trained with Abacus Embeddings generalize to addition problems up to 120 digits in length with state-of-the-art generalization, representing a 6x factor relative to the training distribution—a notable enhancement over the previous 2.5x state-of-the-art.
  • Models utilizing Abacus Embeddings reached up to 99% accuracy on 100-digit addition problems.

Architectural Enhancements:

  • Input Injection: Introducing skip connections that propagate input features into each transformer layer was found to reduce generalization errors by 50% when layered with Abacus Embeddings.
  • Recurrent Layers: By looping transformer layers, notable improvements were observed in multi-step reasoning tasks. The looped transformer, integrated with Abacus Embeddings, showed near-perfect generalization on extensive arithmetic problems.
  • These methods collectively reduced errors from 92.9% to 99.1% in out-of-distribution accuracy, translating to an 87% reduction in error compared to standard architectures.

Extended Implications for Algorithmic Reasoning

The success of Abacus Embeddings extends beyond addition to other algorithmic reasoning tasks like multiplication and sorting.

Multiplication:

  • Transformers augmented with Abacus Embeddings achieved near-perfect accuracy when tested on multiplication problems involving operands of up to 15 digits.
  • The performance remains robust even as complexity increases, highlighting the embeddings’ capabilities in handling more intricate arithmetic tasks.

Sorting:

  • The paper explores sorting problems, presenting arrays of variable length numbers. Abacus Embeddings enhance the model's ability to generalize across diverse scenarios, performing significantly better in generalization tasks than other embeddings.
  • Different architectural setups (standard transformer, transformer with input injection, and looped transformer) were tested, showing varied results. Looped transformers excelled at accurately identifying the minimum element in the array during extrapolation tasks.

Future Prospects and Implications

This study advances the understanding of transformer capabilities in performing arithmetic and algorithmic reasoning tasks. The findings open several avenues for future research:

Integration with General-Purpose Models:

  • Investigating the combination of Abacus Embeddings with embeddings more suited for natural language, such as Rotary Embeddings (RoPE) and Functional Interpolation for Relative Position Embeddings (FIRE), indicates substantial potential. This amalgamation can create a robust embedding strategy that maintains high performance across arithmetic and broader NLP tasks.

Broader Range of Algorithmic Tasks:

  • Extending the current approach to a more diverse set of algorithmic reasoning challenges can help in developing more versatile models and enhance the ability of transformers to generalize in increasingly complex scenarios.

Improved Positional Embedding Strategies:

  • Future research might explore further refinements in positional embeddings, especially those that facilitate better length generalization without significant computational overhead.

In conclusion, the paper presents a noteworthy advance in improving transformer models' performance on arithmetic tasks through the introduction of Abacus Embeddings and recurrent architectures. These techniques not only achieve significant performance gains but also demonstrate promising transferability to other complex algorithmic procedures, paving the way for more practical and theoretically robust applications in AI.

Create an account to read this summary for free:

Newsletter

Get summaries of trending comp sci papers delivered straight to your inbox:

Unsubscribe anytime.

YouTube
HackerNews
Reddit
Transformers Can Do Arithmetic with the Right Embeddings (20 points, 12 comments) in /r/artificial
Transformers Can Do Arithmetic with the Right Embeddings (1 point, 0 comments) in /r/hypeurls
Transformers Can Do Arithmetic with the Right Embeddings (1 point, 1 comment) in /r/hackernews