Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
166 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Agile Autotuning of a Transprecision Tensor Accelerator Overlay for TVM Compiler Stack (2004.10854v1)

Published 20 Apr 2020 in cs.DC, cs.LG, and cs.NE

Abstract: Specialized accelerators for tensor-operations, such as blocked-matrix operations and multi-dimensional convolutions, have been emerged as powerful architecture choices for high-performance Deep-Learning computing. The rapid development of frameworks, models, and precision options challenges the adaptability of such tensor-accelerators since the adaptation to new requirements incurs significant engineering costs. Programmable tensor accelerators offer a promising alternative by allowing reconfiguration of a virtual architecture that overlays on top of the physical FPGA configurable fabric. We propose an overlay ({\tau}-VTA) and an optimization method guided by agile-inspired auto-tuning techniques. We achieve higher performance and faster convergence than state-of-art.

Citations (4)

Summary

We haven't generated a summary for this paper yet.