Emergent Mind

Abstract

Coarse-grain reconfigurable architectures (CGRAs) are gaining traction thanks to their performance and power efficiency. Utilizing CGRAs to accelerate the execution of tight loops holds great potential for achieving significant overall performance gains, as a substantial portion of program execution time is dedicated to tight loops. But loop parallelization using CGRAs is challenging because of loop-carried data dependencies. Traditionally, loop-carried dependencies are handled by spilling dependent values out of the reconfigurable array to a memory medium and then feeding them back to the grid. Spilling the values and feeding them back into the grid imposes additional latencies and logic that impede performance and limit parallelism. In this paper, we present the Dependency Resolved CGRA (DR-CGRA) architecture that is designed to accelerate the execution of tight loops. DR-CGRA, which is based on a massively-multithreaded CGRA, runs each iteration as a separate CGRA thread and maps loop-carried data dependencies to inter-thread communication inside the grid. This design ensures the passage of data-dependent values across loop iterations without spilling them out of the grid. The proposed DR-CGRA architecture was evaluated on various SPEC CPU 2017 benchmarks. The results demonstrated significant performance improvements, with an average speedup ranging from 2.1 to 4.5 and an overall average of 3.1 when compared to state-of-the-art CGRA architecture.

We're not able to analyze this paper right now due to high demand.

Please check back later (sorry!).

Generate a summary of this paper on our Pro plan:

We ran into a problem analyzing this paper.

Newsletter

Get summaries of trending comp sci papers delivered straight to your inbox:

Unsubscribe anytime.