Papers
Topics
Authors
Recent
2000 character limit reached

Integer Discrete Flows and Lossless Compression (1905.07376v4)

Published 17 May 2019 in cs.LG, cs.CV, and stat.ML

Abstract: Lossless compression methods shorten the expected representation size of data without loss of information, using a statistical model. Flow-based models are attractive in this setting because they admit exact likelihood optimization, which is equivalent to minimizing the expected number of bits per message. However, conventional flows assume continuous data, which may lead to reconstruction errors when quantized for compression. For that reason, we introduce a flow-based generative model for ordinal discrete data called Integer Discrete Flow (IDF): a bijective integer map that can learn rich transformations on high-dimensional data. As building blocks for IDFs, we introduce a flexible transformation layer called integer discrete coupling. Our experiments show that IDFs are competitive with other flow-based generative models. Furthermore, we demonstrate that IDF based compression achieves state-of-the-art lossless compression rates on CIFAR10, ImageNet32, and ImageNet64. To the best of our knowledge, this is the first lossless compression method that uses invertible neural networks.

Citations (153)

Summary

  • The paper introduces Integer Discrete Flows, a novel flow-based generative model tailored for ordinal discrete data.
  • It employs integer discrete coupling layers and tractable discrete distributions to bypass quantization errors and optimize compression.
  • Experimental results show that IDFs outperform traditional methods on benchmarks like CIFAR10 and ImageNet, ensuring efficient and robust lossless compression.

Integer Discrete Flows and Lossless Compression

Introduction

The paper "Integer Discrete Flows and Lossless Compression" (1905.07376) explores a novel approach to lossless compression utilizing flow-based generative models specifically designed for ordinal discrete data. Lossless compression is crucial in contexts requiring perfect information preservation, such as medical imaging and storage. Conventional approaches often face difficulties due to assumptions of continuous data, which are not suitable for discrete settings. This work introduces Integer Discrete Flows (IDFs), capable of effectively modeling high-dimensional ordinal data with strong empirical results on image datasets like CIFAR10, ImageNet32, and ImageNet64.

Integer Discrete Flows

IDFs reframe flow-based generative modeling to accommodate discrete data through integer discrete coupling layers. Traditional flows use continuous changes in variables, which lead to challenges in compression due to quantization errors. IDFs offer a bijective map from discrete ordinal data, maintaining data integrity during compression. This model avoids the pitfalls of reconstruction errors seen in previous continuous models repurposed for discrete data. Figure 1

Figure 1: Overview of IDF based lossless compression. An image xx is transformed to a latent representation zz with a tractable distribution pZ(â‹…)p_Z(\cdot). An entropy encoder takes zz and pZ(â‹…)p_Z(\cdot) as input, and produces a bitstream cc. To obtain xx, the decoder uses pZ(â‹…)p_Z(\cdot) and cc to reconstruct zz. Subsequently, zz is mapped to xx using the inverse of the IDF.

Methodology

The core of the IDF approach involves clever use of integer discrete coupling layers for transformation. These are designed to be invertible mappings allowing data to stay within its discrete space, benefiting from efficient encoding algorithms like rANS for achieving high compression rates. An appealing feature is the method’s ability to bypass normal distribution pitfalls associated with traditional flow models, enabling direct application to pixel-level image compression without any quantization.

Key advancements also include Tractable Discrete Distributions and Lower Triangular Coupling techniques, which optimize the encoding and decoding processes, ensuring computational efficiency and robust performance across different depths and network architectures. Figure 2

Figure 2

Figure 2

Figure 2: Left: An example from the ER + BCa histology dataset. Right: 625 IDF samples of size 80×\times80px.

Experimental Results

IDFs were empirically validated against several benchmarks, showing superior compression rates across standard datasets. Particularly noticeable is the IDF's performance on CIFAR10 and image patches from the histology dataset, where IDFs surpassed formats like JPEG2000 and Bit-Swap. Figure 3

Figure 3: Progressive display of the data stream for images taken from the test set of ImageNet64. From top to bottom row, each image uses approximately 15\%, 30\%, 60\% and 100\% of the stream, where the remaining dimensions are sampled. Best viewed electronically.

Theoretical and Practical Implications

The research outlines a promising avenue for lossless compression through neural networks that respect the discrete nature of digital data. The potential applications extend beyond images to video and audio, offering a new perspective on digital media handling. Future work may explore the expansion of IDF frameworks to accommodate more complex data structures or enhance current coupling mechanisms to reduce computational demands further.

Conclusion

Integer Discrete Flows signify a pivotal shift in compression strategies for discrete data, marrying statistical modeling with machine-driven architectures to achieve high-efficiency lossless compression. This work presents a compelling case for rethinking generative models to operate within discrete parameters, opening doors for enhanced digital media processing across varied applications.

Slide Deck Streamline Icon: https://streamlinehq.com

Whiteboard

Dice Question Streamline Icon: https://streamlinehq.com

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.