Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 175 tok/s
Gemini 2.5 Pro 52 tok/s Pro
GPT-5 Medium 36 tok/s Pro
GPT-5 High 38 tok/s Pro
GPT-4o 92 tok/s Pro
Kimi K2 218 tok/s Pro
GPT OSS 120B 442 tok/s Pro
Claude Sonnet 4.5 38 tok/s Pro
2000 character limit reached

Characterization and Architectural Implications of Big Data Workloads (1506.07943v1)

Published 26 Jun 2015 in cs.DC, cs.DB, and cs.PF

Abstract: Big data areas are expanding in a fast way in terms of increasing workloads and runtime systems, and this situation imposes a serious challenge to workload characterization, which is the foundation of innovative system and architecture design. The previous major efforts on big data benchmarking either propose a comprehensive but a large amount of workloads, or only select a few workloads according to so-called popularity, which may lead to partial or even biased observations. In this paper, on the basis of a comprehensive big data benchmark suite---BigDataBench, we reduced 77 workloads to 17 representative workloads from a micro-architectural perspective. On a typical state-of-practice platform---Intel Xeon E5645, we compare the representative big data workloads with SPECINT, SPECCFP, PARSEC, CloudSuite and HPCC. After a comprehensive workload characterization, we have the following observations. First, the big data workloads are data movement dominated computing with more branch operations, taking up to 92% percentage in terms of instruction mix, which places them in a different class from Desktop (SPEC CPU2006), CMP (PARSEC), HPC (HPCC) workloads. Second, corroborating the previous work, Hadoop and Spark based big data workloads have higher front-end stalls. Comparing with the traditional workloads i. e. PARSEC, the big data workloads have larger instructions footprint. But we also note that, in addition to varied instruction-level parallelism, there are significant disparities of front-end efficiencies among different big data workloads. Third, we found complex software stacks that fail to use state-of-practise processors efficiently are one of the main factors leading to high front-end stalls. For the same workloads, the L1I cache miss rates have one order of magnitude differences among diverse implementations with different software stacks.

Citations (8)

Summary

We haven't generated a summary for this paper yet.

Dice Question Streamline Icon: https://streamlinehq.com

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Lightbulb Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.