Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 156 tok/s
Gemini 2.5 Pro 44 tok/s Pro
GPT-5 Medium 23 tok/s Pro
GPT-5 High 22 tok/s Pro
GPT-4o 109 tok/s Pro
Kimi K2 168 tok/s Pro
GPT OSS 120B 455 tok/s Pro
Claude Sonnet 4.5 32 tok/s Pro
2000 character limit reached

Storage and Memory Characterization of Data Intensive Workloads for Bare Metal Cloud (1805.08332v2)

Published 22 May 2018 in cs.DC

Abstract: As the cost-per-byte of storage systems dramatically decreases, SSDs are finding their ways in emerging cloud infrastructure. Similar trend is happening for main memory subsystem, as advanced DRAM technologies with higher capacity, frequency and number of channels are deploying for cloud-scale solutions specially for non-virtualized environment where cloud subscribers can exactly specify the configuration of underling hardware. Given the performance sensitivity of standard workloads to the memory hierarchy parameters, it is important to understand the role of memory and storage for data intensive workloads. In this paper, we investigate how the choice of DRAM (high-end vs low-end) impacts the performance of Hadoop, Spark, and MPI based Big Data workloads in the presence of different storage types on bare metal cloud. Through a methodical experimental setup, we have analyzed the impact of DRAM capacity, operating frequency, the number of channels, storage type, and scale-out factors on the performance of these popular frameworks. Based on micro-architectural analysis, we classified data-intensive workloads into three groups namely I/O bound, compute bound, and memory bound. The characterization results show that neither DRAM capacity, frequency, nor the number of channels play a significant role on the performance of all studied Hadoop workloads as they are mostly I/O bound. On the other hand, our results reveal that iterative tasks (e.g. machine learning) in Spark and MPI are benefiting from a high-end DRAM in particular high frequency and large number of channels, as they are memory or compute bound. Our results show that using SSD PCIe cannot shift the bottleneck from storage to memory, while it can change the workload behavior from I/O bound to compute bound.

Citations (6)

Summary

We haven't generated a summary for this paper yet.

Lightbulb Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

Don't miss out on important new AI/ML research

See which papers are being discussed right now on X, Reddit, and more:

“Emergent Mind helps me see which AI papers have caught fire online.”

Philip

Philip

Creator, AI Explained on YouTube