Emergent Mind

Abstract

In-memory columnar databases have become mainstream over the last decade and have vastly improved the fast processing of large volumes of data through multi-core parallelism and in-memory compression thereby eliminating the usual bottlenecks associated with disk-based databases. For scenarios, where the data volume grows into terabytes and petabytes, keeping all the data in memory is exorbitantly expensive. Hence, the data is compressed efficiently using different algorithms to exploit the multi-core parallelization technologies for query processing. Several compression methods are studied for compressing the column array, post Dictionary Encoding. In this paper, we will present two novel optimizations in compression techniques - Block Size Optimized Cluster Encoding and Block Size Optimized Indirect Encoding - which perform better than their predecessors. In the end, we also propose heuristics to choose the best encoding amongst common compression schemes.

We're not able to analyze this paper right now due to high demand.

Please check back later (sorry!).

Generate a summary of this paper on our Pro plan:

We ran into a problem analyzing this paper.

Newsletter

Get summaries of trending comp sci papers delivered straight to your inbox:

Unsubscribe anytime.