Best-Effort FPGA Programming: A Few Steps Can Go a Long Way

Published 3 Jul 2018 in cs.AR, cs.DC, and cs.PF | (1807.01340v1)

Abstract: FPGA-based heterogeneous architectures provide programmers with the ability to customize their hardware accelerators for flexible acceleration of many workloads. Nonetheless, such advantages come at the cost of sacrificing programmability. FPGA vendors and researchers attempt to improve the programmability through high-level synthesis (HLS) technologies that can directly generate hardware circuits from high-level language descriptions. However, reading through recent publications on FPGA designs using HLS, one often gets the impression that FPGA programming is still hard in that it leaves programmers to explore a very large design space with many possible combinations of HLS optimization strategies. In this paper we make two important observations and contributions. First, we demonstrate a rather surprising result: FPGA programming can be made easy by following a simple best-effort guideline of five refinement steps using HLS. We show that for a broad class of accelerator benchmarks from MachSuite, the proposed best-effort guideline improves the FPGA accelerator performance by 42-29,030x. Compared to the baseline CPU performance, the FPGA accelerator performance is improved from an average 292.5x slowdown to an average 34.4x speedup. Moreover, we show that the refinement steps in the best-effort guideline, consisting of explicit data caching, customized pipelining, processing element duplication, computation/communication overlapping and scratchpad reorganization, correspond well to the best practice guidelines for multicore CPU programming. Although our best-effort guideline may not always lead to the optimal solution, it substantially simplifies the FPGA programming effort, and will greatly support the wide adoption of FPGA-based acceleration by the software programming community.