A Fast Transformer-based General-Purpose Lossless Compressor (2203.16114v2)

Published 30 Mar 2022 in cs.LG, cs.IT, and math.IT

Abstract: Deep-learning-based compressor has received interests recently due to much improved compression ratio. However, modern approaches suffer from long execution time. To ease this problem, this paper targets on cutting down the execution time of deep-learning-based compressors. Building history-dependencies sequentially (e.g., recurrent neural networks) is responsible for long inference latency. Instead, we introduce transformer into deep learning compressors to build history-dependencies in parallel. However, existing transformer is too heavy in computation and incompatible to compression tasks. This paper proposes a fast general-purpose lossless compressor, TRACE, by designing a compression-friendly structure based on a single-layer transformer. We first design a new metric to advise the selection part of compression model structures. Byte-grouping and Shared-ffn schemes are further proposed to fully utilize the capacity of the single-layer transformer. These features allow TRACE to achieve competitive compression ratio and a much faster speed. In addition, we further accelerate the compression procedure by designing a controller to reduce the parameter updating overhead. Experiments show that TRACE achieves an overall $\sim$3x speedup while keeps a comparable compression ratio to the state-of-the-art compressors. The source code for TRACE and links to the datasets are available at https://github.com/mynotwo/A-Fast-Transformer-based-General-Purpose-LosslessCompressor.

Citations (20)

View on Semantic Scholar

Summary

The paper introduces TRACE, a fast transformer-based compressor that significantly reduces inference latency while maintaining competitive lossless compression ratios.
It leverages innovations like Latency-to-Compression Ratio, byte-grouping, and shared feed-forward networks to optimize performance.
Experimental results show a 3x speed improvement and efficient GPU memory use, demonstrating TRACE's scalability across varied data types.

A Transformer-based Solution for Fast, General-Purpose Lossless Data Compression

Data compression remains a critical challenge in computing due to the burgeoning volume of data processed by myriad devices and services. While deep-learning-based compressors have demonstrated substantial improvements in compression ratios, they often suffer from increased execution times. This paper presents TRACE, an efficient, transformer-based, lossless compressor designed to address this shortfall. TRACE leverages a streamlined transformer architecture, specifically tailored for compression tasks, to significantly enhance compression speeds while maintaining competitive compression ratios.

Key Contributions

Transformer-Centric Compression: Traditional deep-learning compressors frequently utilize RNNs to model historical dependencies, which results in high latency due to their sequential nature. TRACE shifts to a transformer-based approach to parallelize the construction of these dependencies, thereby reducing inference latency. However, the conventional transformer's computational heft necessitated the development of optimizations to render it suitable for compression tasks.
Compression-Friendly Transformer: TRACE employs a single-layer transformer, optimized for lossless compression. It introduces the following innovations to ensure efficacy and efficiency:
- Latency-to-Compression Ratio (LCR): A novel metric to balance compression time against compression ratio improvements.
- Byte-Grouping: Decouples byte vector dimensions from the hidden dimensions of the transformer, allowing a broader context for each byte while reducing redundancy.
- Shared Feed-Forward Networks (Shared-ffn): Increases model capacity without adding parameters, thereby enhancing the transformer's ability to detect patterns necessary for efficient compression.
Control of Parameter Updates: To mitigate the computational burden of frequent parameter updates, TRACE integrates a Back-Prop (BP) Controller. This component uses cross-entropy loss to selectively perform updates, achieving a 30% average speedup with minimal impact on compression ratios.

Experimental Results

TRACE was evaluated against several state-of-the-art compressors on diverse datasets, including text, audio, image, floating-point data, and heterogeneous data streams. The results highlight TRACE's superior performance in terms of computational efficiency and compression effectiveness:

Compression Ratios: TRACE achieved compression ratios of 5.29:1 on Enwik9 and 4.58:1 on BookCorpus, outperforming traditional compressors like Gzip and 7z by significant margins. On non-text datasets, TRACE also demonstrated higher compression ratios compared to Dzip and maintained robust performance across various data types.
Compression Speeds: TRACE showcased impressive compression speeds, averaging 952.3 kb/min, marking a 3x improvement over Dzip. The inclusion of the BP controller further augmented the compression speed to 1228.8 kb/min, with only a modest drop (1-2%) in compression ratios.
Computational Efficiency: TRACE's optimization led to reduced peak memory usage and inference latencies, enabling it to handle large batch sizes efficiently. For instance, TRACE achieved a 1.4GB peak GPU memory usage and a throughput of 81,594 bytes per second with a batch size of 2048.

Implications and Future Directions

The introduction of TRACE marks a significant step forward in the field of data compression, particularly for environments where speed and efficiency are paramount. By leveraging the parallelism inherent in transformers and optimizing for compression-specific tasks, TRACE bridges the gap between high compression ratios and fast execution times.

The practical implications of TRACE are manifold:

Energy Efficiency: Faster compression times translate to lower energy consumption, which is crucial for large-scale data centers and cloud service providers.
Scalability: The efficient use of GPU memory and computational resources ensures that TRACE can scale to larger datasets and more extensive deployments.
Broad Applicability: The general-purpose nature of TRACE makes it suitable for a wide range of data types, from textual data to multimedia and beyond.

Moving forward, it will be interesting to explore further optimizations tailored to specific data types and to investigate the integration of TRACE with other modern compression algorithms. Additionally, extending TRACE to leverage advancements in hardware acceleration and distributed computing could further enhance its performance.

Conclusion

TRACE exemplifies a forward-thinking approach to lossless data compression, addressing the critical need for speed without compromising on compression efficacy. Through methodical enhancements to transformer architecture and intelligent control of parameter updates, TRACE sets a new benchmark for deep-learning-based compressors, demonstrating that high performance and robust compression ratios can indeed coexist.

PDF Markdown

Related Papers

Tweets

https://twitter.com/skryl_alex/status/1834710027195601187