Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
110 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

f-CNN$^{\text{x}}$: A Toolflow for Mapping Multi-CNN Applications on FPGAs (1805.10174v2)

Published 25 May 2018 in cs.CV, cs.AI, and cs.AR

Abstract: The predictive power of Convolutional Neural Networks (CNNs) has been an integral factor for emerging latency-sensitive applications, such as autonomous drones and vehicles. Such systems employ multiple CNNs, each one trained for a particular task. The efficient mapping of multiple CNNs on a single FPGA device is a challenging task as the allocation of compute resources and external memory bandwidth needs to be optimised at design time. This paper proposes f-CNN${\text{x}}$, an automated toolflow for the optimised mapping of multiple CNNs on FPGAs, comprising a novel multi-CNN hardware architecture together with an automated design space exploration method that considers the user-specified performance requirements for each model to allocate compute resources and generate a synthesisable accelerator. Moreover, f-CNN${\text{x}}$ employs a novel scheduling algorithm that alleviates the limitations of the memory bandwidth contention between CNNs and sustains the high utilisation of the architecture. Experimental evaluation shows that f-CNN${\text{x}}$'s designs outperform contention-unaware FPGA mappings by up to 50% and deliver up to 6.8x higher performance-per-Watt over highly optimised GPU designs for multi-CNN systems.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (2)
Citations (20)

Summary

We haven't generated a summary for this paper yet.