Emergent Mind

Abstract

This paper presents a survey of architectural features among four generations of Intel server processors (Sandy Bridge, Ivy Bridge, Haswell, and Broad- well) with a focus on performance with floating point workloads. Starting on the core level and going down the memory hierarchy we cover instruction throughput for floating-point instructions, L1 cache, address generation capabilities, core clock speed and its limitations, L2 and L3 cache bandwidth and latency, the impact of Cluster on Die (CoD) and cache snoop modes, and the Uncore clock speed. Using microbenchmarks we study the influence of these factors on code performance. This insight can then serve as input for analytic performance models. We show that the energy efficiency of the LINPACK and HPCG benchmarks can be improved considerably by tuning the Uncore clock speed without sacrificing performance, and that the Graph500 benchmark performance may profit from a suitable choice of cache snoop mode settings.

We're not able to analyze this paper right now due to high demand.

Please check back later (sorry!).

Generate a summary of this paper on our Pro plan:

We ran into a problem analyzing this paper.

Newsletter

Get summaries of trending comp sci papers delivered straight to your inbox:

Unsubscribe anytime.