Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Evaluating Hadoop Clusters with TPCx-HS (1509.03486v3)

Published 11 Sep 2015 in cs.DC

Abstract: The growing complexity and variety of Big Data platforms makes it both difficult and time consuming for all system users to properly setup and operate the systems. Another challenge is to compare the platforms in order to choose the most appropriate one for a particular application. All these factors motivate the need for a standardized Big Data benchmark that can help the users in the process of platform evaluation. Just recently TPCx-HS [1][2] has been released as the first standardized Big Data benchmark designed to stress test a Hadoop cluster. The goal of this study is to evaluate and compare how the network setup influences the performance of a Hadoop cluster. In particular, experiments were performed using shared and dedicated 1Gbit networks utilized by the same Cloudera Hadoop Distribution (CDH) cluster setup. The TPCx-HS benchmark, which is very network intensive, was used to stress test and compare both cluster setups. All the presented results are obtained by using the officially available version [1] of the benchmark, but they are not comparable with the officially reported results and are meant as an experimental evaluation, not audited by any external organization. As expected the dedicated 1Gbit network setup performed much faster than the shared 1Gbit setup. However, what was surprising is the negligible price difference between both cluster setups, which pays off with a multifold performance return.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (2)
  1. Todor Ivanov (8 papers)
  2. Sead Izberovic (3 papers)
Citations (1)

Summary

We haven't generated a summary for this paper yet.