Resource-Aware Just-in-Time OpenCL Compiler for Coarse-Grained FPGA Overlays

Published 8 May 2017 in cs.AR | (1705.02730v1)

Abstract: FPGA vendors have recently started focusing on OpenCL for FPGAs because of its ability to leverage the parallelism inherent to heterogeneous computing platforms. OpenCL allows programs running on a host computer to launch accelerator kernels which can be compiled at run-time for a specific architecture, thus enabling portability. However, the prohibitive compilation times (specifically the FPGA place and route times) are a major stumbling block when using OpenCL tools from FPGA vendors. The long compilation times mean that the tools cannot effectively use just-in-time (JIT) compilation or runtime performance scaling. Coarse-grained overlays represent a possible solution by virtue of their coarse granularity and fast compilation. In this paper, we present a methodology for run-time compilation of OpenCL kernels to a DSP block based coarse-grained overlay, rather than directly to the fine-grained FPGA fabric. The proposed methodology allows JIT compilation and on-demand resource-aware kernel replication to better utilize available overlay resources, raising the abstraction level while reducing compile times significantly. We further demonstrate that this approach can even be used for run-time compilation of OpenCL kernels on the ARM processor of the embedded heterogeneous Zynq device.