An Experimental Study of Distributed Quantile Estimation (1508.05710v1)

Published 24 Aug 2015 in cs.DB

Abstract: Quantiles are very important statistics information used to describe the distribution of datasets. Given the quantiles of a dataset, we can easily know the distribution of the dataset, which is a fundamental problem in data analysis. However, quite often, computing quantiles directly is inappropriate due to the memory limitations. Further, in many settings such as data streaming and sensor network model, even the data size is unpredictable. Although the quantiles computation has been widely studied, it was mostly in the sequential setting. In this paper, we study several quantile computation algorithms in the distributed setting and compare them in terms of space usage, running time, and accuracy. Moreover, we provide detailed experimental comparisons between several popular algorithms. Our work focuses on the approximate quantile algorithms which provide error bounds. Approximate quantiles have received more attentions than exact ones since they are often faster, can be more easily adapted to the distributed setting while giving sufficiently good statistical information on the data sets.

Citations (1)

View on Semantic Scholar