Instruction Block Movement with Coupled High-Level Program Sequencing (2406.06738v1)
Abstract: Efficiency in instruction fetching is critical to performance, and this requires the primary structures -- L1 instruction caches (L1i), branch target buffers (BTB) and instruction TLBs (iTLB) -- to have the requisite information when needed. This paper proposes a high-level program sequencing mechanism and a coupled technique for block movement, instruction presending, where instruction cache blocks, BTB entries, and iTLB entries are autonomously moved (or sent) from the secondary to the primary structures in a "just in time" fashion so that they are available when needed. Empirical results are presented to demonstrate the efficacy of the high-level sequencing mechanism and block movement. Presending is especially effective for benchmarks with a high base MPKI, where the movement of instruction blocks (and BTB/iTLB entries) from secondary to primary structures is frequent. Presending reduces the number of misses in primary structures by an order of magnitude as compared to state-of-the-art instruction prefetching schemes, in many cases, while allowing the processor to operate with small-sized primary BTBs. This reduction in misses results in performance improvements in cases where front-end efficiency is important.
Collections
Sign up for free to add this paper to one or more collections.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.