Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 131 tok/s
Gemini 2.5 Pro 49 tok/s Pro
GPT-5 Medium 19 tok/s Pro
GPT-5 High 21 tok/s Pro
GPT-4o 79 tok/s Pro
Kimi K2 185 tok/s Pro
GPT OSS 120B 425 tok/s Pro
Claude Sonnet 4.5 37 tok/s Pro
2000 character limit reached

Hardware Abstractions and Hardware Mechanisms to Support Multi-Task Execution on Coarse-Grained Reconfigurable Arrays (2301.00861v1)

Published 2 Jan 2023 in cs.AR

Abstract: Domain-specific accelerators are used in various computing systems ranging from edge devices to data centers. Coarse-grained reconfigurable arrays (CGRAs) represent an architectural midpoint between the flexibility of an FPGA and the efficiency of an ASIC and are a promising candidate for servicing multi-tasked workloads within an application domain. Unfortunately, scheduling multiple tasks onto a CGRA is challenging. CGRAs lack abstractions that capture hardware resources, leaving workload schedulers unable to reason about performance, energy, and utilization for different schedules. This work first proposes a CGRA architecture that can flexibly partition key resources, including the global buffer memory capacity, the global buffer memory bandwidth, and the compute resources. Partitioned resources serve as hardware abstractions that decouple compilation and resource allocation. The compiler uses these abstractions for coarse-grained resource mapping, and the scheduler uses them for flexible resource allocation at run time. We then propose two hardware mechanisms to support multi-task execution. A flexible-shape execution region increases the overall resource utilization by mapping multiple tasks with different resource requirements. Dynamic partial reconfiguration (DPR) enables a CGRA to update the hardware configuration as the scheduler makes decisions rapidly. We show that our abstraction can help automatic and efficient scheduling of multi-tasked workloads onto our target CGRA with high utilization, resulting in 1.05x-1.24x higher throughput and a 23-28% lower latency in a multi-tasked cloud workload and 60.8% reduced latency in an autonomous system workload when compared to a baseline CGRA running single tasks at a time.

Citations (2)

Summary

We haven't generated a summary for this paper yet.

Dice Question Streamline Icon: https://streamlinehq.com

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Lightbulb Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets

This paper has been mentioned in 2 tweets and received 1 like.

Upgrade to Pro to view all of the tweets about this paper: