Papers
Topics
Authors
Recent
Detailed Answer
Quick Answer
Concise responses based on abstracts only
Detailed Answer
Well-researched responses based on abstracts and relevant paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses
Gemini 2.5 Flash
Gemini 2.5 Flash 28 tok/s
Gemini 2.5 Pro 40 tok/s Pro
GPT-5 Medium 16 tok/s Pro
GPT-5 High 13 tok/s Pro
GPT-4o 103 tok/s Pro
Kimi K2 197 tok/s Pro
GPT OSS 120B 471 tok/s Pro
Claude Sonnet 4 38 tok/s Pro
2000 character limit reached

Automatic multi-dimensional pipelining for high-level synthesis of dataflow accelerators (2309.03203v1)

Published 4 Aug 2023 in cs.AR and cs.PF

Abstract: In recent years, there has been a surging demand for edge computing of image processing and machine learning workloads. This has reignited interest in the development of custom hardware accelerators that can deliver enhanced performance and improved energy efficiency. These workloads frequently demonstrate affine memory accesses and constant loop bounds. In this paper, we introduce an ILP-based automatic scheduler for high-level synthesis, with a specific emphasis on aggressive pipelining to enhance parallelism. In this study, we propose a unified Integer Linear Programming (ILP) formulation that can identify pipelining opportunities along multiple loop and scalar dimensions. Our multi-dimensional pipelining technique encompasses both inner loop pipelining and dataflow optimizations of Vitis HLS, while also being capable of handling more general memory access patterns compared to the dataflow optimization in Vitis HLS. Furthermore, our approach enables the generation of statically scheduled circuits, leading to improved resource efficiency. We have integrated our scheduler into a high-level synthesis compiler framework (HIR) based on MLIR and conducted performance evaluations. Our findings reveal that our scheduler, in comparison to Vitis HLS, can achieve more aggressive pipelining across multiple producer-consumer loop nests, resulting in reduced overall execution latency. The producer-consumer pipelined execution facilitated by our scheduler yields an average performance improvement of 2.42X across a set of representative benchmarks when compared to only loop pipelining. Furthermore, we achieved an average performance improvement of 1.30X over Vitis HLS with dataflow optimizations.

Citations (1)
List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

Summary

We haven't generated a summary for this paper yet.

Dice Question Streamline Icon: https://streamlinehq.com

Follow-Up Questions

We haven't generated follow-up questions for this paper yet.