Emergent Mind

Compression Represents Intelligence Linearly

(2404.09937)
Published Apr 15, 2024 in cs.CL , cs.AI , cs.IT , cs.LG , and math.IT

Abstract

There is a belief that learning to compress well will lead to intelligence. Recently, language modeling has been shown to be equivalent to compression, which offers a compelling rationale for the success of LLMs: the development of more advanced language models is essentially enhancing compression which facilitates intelligence. Despite such appealing discussions, little empirical evidence is present for the interplay between compression and intelligence. In this work, we examine their relationship in the context of LLMs, treating LLMs as data compressors. Given the abstract concept of "intelligence", we adopt the average downstream benchmark scores as a surrogate, specifically targeting intelligence related to knowledge and commonsense, coding, and mathematical reasoning. Across 12 benchmarks, our study brings together 30 public LLMs that originate from diverse organizations. Remarkably, we find that LLMs' intelligence -- reflected by average benchmark scores -- almost linearly correlates with their ability to compress external text corpora. These results provide concrete evidence supporting the belief that superior compression indicates greater intelligence. Furthermore, our findings suggest that compression efficiency, as an unsupervised metric derived from raw text corpora, serves as a reliable evaluation measure that is linearly associated with the model capabilities. We open-source our compression datasets as well as our data collection pipelines to facilitate future researchers to assess compression properly.

Graph showing how benchmark score relates to compression in various datasets including NaturalQuestions and ARC-Challenge.

Overview

  • The paper investigates the correlation between compression capability and intelligence in LLMs, suggesting that the ability to compress data might indicate a model's intelligence.

  • It examines 30 public LLMs across 12 benchmarks to explore the relationship between compression efficiency and performance on tasks requiring knowledge, commonsense, coding, and mathematical reasoning.

  • Findings reveal a near-linear correlation between compression efficiency and task performance, with a Pearson correlation coefficient around -0.95, suggesting compression could serve as a measure of LLM intelligence.

  • The study notes the importance of further research on the role of compression in LLM evaluation and introduces potential future directions including exploring the impact of different compression corpora and the minimum corpus size for reliable BPC computation.

Exploring the Correlation Between Compression and Intelligence in LLMs

Introduction

The correlation between compression capability and perceived intelligence in LLMs has been a topic of theoretical exploration within the AI community for some time. Leveraging insights from compression theory, this paper empirically investigates this correlation, positing that the ability of LLMs to compress external text corpora could serve as an indicator of their intelligence. Intelligence, for the purposes of this study, is operationalized through performance across a range of downstream tasks encompassing knowledge and commonsense, coding, and mathematical reasoning. The study encompasses an examination of 30 public LLMs across 12 benchmarks to explore the veracity of these theoretical claims.

Background

The equivalence between language modeling and compression stems from the premise that efficient prediction models can be converted into efficient lossless compressors, and vice versa. This paper succinctly outlines the foundational theories underscoring this relationship, primarily focusing on the source coding theorem and arithmetic coding as a practical application for lossless data compression. It extends this theory to language models, highlighting the potential for LLMs to serve as general-purpose compressors, providing they can minimize the average code length required to represent data.

Methodology

The paper undertakes a meticulous approach to validate the theoretical compression-intelligence correlation within LLMs. An extensive array of models representing varied sizes, architectures, and originating organizations were assessed. Intelligence evaluations were grounded in model performance on downstream tasks that were carefully selected to encompass areas critical to AI applications today: knowledge and commonsense, coding, and mathematical reasoning. Compression efficiency was quantified through the bits per character (BPC) metric, ensuring alignment with the evaluation context window sizes across all LLMs. The diversity in the models assessed and the consideration for matching the context window sizes across tasks were crucial for drawing generalizable conclusions.

Results

The study identifies a near-linear correlation between LLMs' compression efficiency and their performance on downstream tasks, with a Pearson correlation coefficient consistently around -0.95 across different intelligence domains. This correlation was substantiated across different models and benchmarks, establishing a robust link that transcends model size, architecture, and training data differences. Remarkably, this pattern persisted even when examining individual benchmarks, suggesting that compression efficiency could predict performance with considerable accuracy.

Discussion

The findings from this research offer compelling empirical evidence to the long-held belief that there exists a significant correlation between a model's ability to compress data and its performance on tasks that require intelligence. This not only reinforces the theoretical frameworks that position compression as central to intelligent behavior but also suggests practical implications for the evaluation of LLMs. The identification of compression efficiency as a potential unsupervised metric for estimating LLM performance is promising, particularly given the challenges associated with benchmark overfitting and the contamination of evaluation datasets.

Future Directions

While the paper provides substantial evidence supporting the correlation between compression and intelligence, it also opens several avenues for future research. Among these is the exploration of this correlation in fine-tuned models, the impact of different compression corpora on the observed relationship, and the minimum corpus size necessary for reliable BPC computation. Additionally, it invites further investigation into tasks requiring cross-domain abilities, suggesting that compression across diverse datasets might offer a more holistic view of a model's intelligence.

In conclusion, this study substantiates the theoretical premise that superior compression signifies greater intelligence in LLMs, advocating for compression efficiency as a viable metric for LLM evaluation. By empirically establishing this correlation across a wide array of models and benchmarks, the paper lays a foundation for both theoretical and practical advances in understanding and assessing the intelligence of language models.

Create an account to read this summary for free:

Newsletter

Get summaries of trending comp sci papers delivered straight to your inbox:

Unsubscribe anytime.

YouTube