Emergent Mind

Abstract

Efficiency in instruction fetching is critical to performance, and this requires the primary structures -- L1 instruction caches (L1i), branch target buffers (BTB) and instruction TLBs (iTLB) -- to have the requisite information when needed. This paper proposes a high-level program sequencing mechanism and a coupled technique for block movement, instruction presending, where instruction cache blocks, BTB entries, and iTLB entries are autonomously moved (or sent) from the secondary to the primary structures in a "just in time" fashion so that they are available when needed. Empirical results are presented to demonstrate the efficacy of the high-level sequencing mechanism and block movement. Presending is especially effective for benchmarks with a high base MPKI, where the movement of instruction blocks (and BTB/iTLB entries) from secondary to primary structures is frequent. Presending reduces the number of misses in primary structures by an order of magnitude as compared to state-of-the-art instruction prefetching schemes, in many cases, while allowing the processor to operate with small-sized primary BTBs. This reduction in misses results in performance improvements in cases where front-end efficiency is important.

We're not able to analyze this paper right now due to high demand.

Please check back later (sorry!).

Generate a summary of this paper on our Pro plan:

We ran into a problem analyzing this paper.

Newsletter

Get summaries of trending comp sci papers delivered straight to your inbox:

Unsubscribe anytime.