CIS: Composable Instruction Set for Data Streaming Applications (2407.00207v2)
Abstract: The enhanced efficiency of hardware accelerators, including Single Instruction Multiple Data (SIMD) architectures and Coarse-Grained Reconfigurable Architectures (CGRAs), is driving significant advancements in Artificial Intelligence and Machine Learning (AI/ML) applications. These applications frequently involve data streaming operations comprised of numerous vector calculations inherently amenable to parallelization. However, despite considerable progress in hardware accelerator design, their potential remains constrained by conventional instruction set architectures (ISAs). Traditional ISAs, primarily designed for microprocessors and accelerators, emphasize computation while often neglecting instruction composability and inter-instruction cooperation. This limitation results in rigid ISAs that are difficult to extend and suffer from large control overhead in their hardware implementations. To address this, we present a novel composable instruction set (CIS) architecture, designed with both spatial and temporal composability, making it well-suited for data streaming applications. The proposed CIS utilizes a small instruction set, yet efficiently implements complex, multi-level loop structures essential for accelerating data streaming workloads. Furthermore, CIS adopts a resource-centric approach, facilitating straightforward extension through the integration of new hardware resources, enabling the creation of custom, heterogeneous computing platforms. Our results comparing performance between the proposed CIS and other state-of-the-art ISAs demonstrate that a CIS-based architecture significantly outperforms existing solutions, achieving near-optimal processing element (PE) utilization.