Optimal Round and Sample-Size Complexity for Partitioning in Parallel Sorting (2204.04599v6)
Abstract: State-of-the-art parallel sorting algorithms for distributed-memory architectures are based on computing a balanced partitioning via sampling and histogramming. By finding samples that partition the sorted keys into evenly-sized chunks, these algorithms minimize the number of communication rounds required. Histogramming (computing positions of samples) guides sampling, enabling a decrease in the overall number of samples collected. We derive lower and upper bounds on the number of sampling/histogramming rounds required to compute a balanced partitioning. We improve on prior results to demonstrate that when using $p$ processors, $O(\log* p)$ rounds with $O(p/\log* p)$ samples per round suffice. We match that with a lower bound that shows that any algorithm with $O(p)$ samples per round requires at least $\Omega(\log* p)$ rounds. Additionally, we prove the $\Omega(p \log p)$ samples lower bound for one round, thus proving that existing one round algorithms: sample sort, AMS sort and HSS have optimal sample size complexity. To derive the lower bound, we propose a hard randomized input distribution and apply classical results from the distribution theory of runs.
Collections
Sign up for free to add this paper to one or more collections.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.