A Scalable RISC-V Vector Processor Enabling Efficient Multi-Precision DNN Inference (2401.16872v2)

Published 30 Jan 2024 in cs.AR

Abstract: RISC-V processors encounter substantial challenges in deploying multi-precision deep neural networks (DNNs) due to their restricted precision support, constrained throughput, and suboptimal dataflow design. To tackle these challenges, a scalable RISC-V vector (RVV) processor, namely SPEED, is proposed to enable efficient multi-precision DNN inference by innovations from customized instructions, hardware architecture, and dataflow mapping. Firstly, dedicated customized RISC-V instructions are proposed based on RVV extensions, providing SPEED with fine-grained control over processing precision ranging from 4 to 16 bits. Secondly, a parameterized multi-precision systolic array unit is incorporated within the scalable module to enhance parallel processing capability and data reuse opportunities. Finally, a mixed multi-precision dataflow strategy, compatible with different convolution kernels and data precision, is proposed to effectively improve data utilization and computational efficiency. We perform synthesis of SPEED in TSMC 28nm technology. The experimental results demonstrate that SPEED achieves a peak throughput of 287.41 GOPS and an energy efficiency of 1335.79 GOPS/W at 4-bit precision condition, respectively. Moreover, when compared to the pioneer open-source vector processor Ara, SPEED provides an area efficiency improvement of 2.04$\times$ and 1.63$\times$ under 16-bit and 8-bit precision conditions, respectively, which shows SPEED's significant potential for efficient multi-precision DNN inference.

Authors (5)

Chuanning Wang (2 papers)
Chao Fang (52 papers)
Xiao Wu (55 papers)
Zhongfeng Wang (50 papers)
Jun Lin (87 papers)

Citations (2)

View on Semantic Scholar

Summary

The paper introduces SPEED, a RISC-V vector processor that improves multi-precision DNN inference using tailored instructions and a parameterized systolic array.
It employs a mixed multi-precision dataflow strategy that optimizes memory access and boosts throughput while reducing off-chip data movement.
Experimental validation on TSMC 28nm technology shows a peak throughput of 287.41 GOPS and significant area efficiency gains over existing vector processors.

Review of a Scalable RISC-V Vector Processor for Efficient Multi-Precision DNN Inference

The quest for developing efficient hardware capable of executing deep neural network (DNN) inference tasks with multi-precision capabilities has led to significant interest in RISC-V processors. This paper introduces SPEED, a scalable RISC-V vector processor designed to enhance the performance and efficiency of multi-precision DNN inference. The primary focus of this research is to address existing challenges with traditional RISC-V architectures, such as limited precision support, constrained throughput, and inefficient dataflow mapping.

Overview

SPEED makes several key contributions: the introduction of customized RISC-V instructions based on RVV extensions, a parameterized multi-precision systolic array unit, and a mixed multi-precision dataflow strategy. These innovations collectively improve data utilization and computational efficiency across varied DNN workloads.

Key Innovations

Customized RISC-V Instructions: The adaptation of the RVV extensions into tailored instructions allows for finer control over precision levels, facilitating operations ranging from 4 to 16 bits. This customization is critical for supporting a diverse array of DNN architectures without sacrificing performance.
Hardware Architecture: SPEED integrates a scalable RVV processor with enhanced parallel processing capabilities. By incorporating a parameterized systolic array unit, the design not only increases computational parallelism but also exploits significant data reuse opportunities, thereby bolstering execution efficiency.
Dataflow Mapping Strategy: A robust mixed multi-precision dataflow strategy is employed, enabling SPEED to adapt seamlessly to varying convolution kernels and precision levels. This strategy significantly enhances throughput by optimizing memory access patterns and reducing the overhead associated with off-chip data movements.

Experimental Validation

The implementation of SPEED using TSMC 28nm technology yielded impressive experimental results. SPEED demonstrates a peak throughput of 287.41 GOPS and an energy efficiency of 1335.79 GOPS/W at 4-bit precision. These results indicate a substantial improvement over Ara, a benchmark open-source vector processor, with SPEED achieving an area efficiency uplift of 2.04× and 1.63× under 16-bit and 8-bit precision conditions, respectively.

Implications and Future Directions

The proposed SPEED processor exhibits substantial improvements in handling DNN workloads efficiently, suggesting promising use cases in embedded systems and edge computing where energy efficiency and processing capability are critical. The use of customized RVV instructions and an advanced dataflow strategy could inspire further research into developing specialized processors for specific application domains.

In future endeavors, extending this work to explore further reductions in power consumption while maintaining or improving performance would be beneficial. Additionally, investigating the scalability of SPEED in handling more complex DNN models, particularly those used in advanced AI applications, could provide deeper insights into the generalizability of this architecture.

Conclusion

SPEED represents a substantial advancement in the field of processor design for AI applications, particularly in the context of multi-precision DNN inference. By addressing key challenges associated with existing RISC-V architectures and providing innovative architectural solutions, this work lays a strong foundation for future high-performance, energy-efficient processor designs tailored for the ever-evolving needs of deep learning inference.

PDF Markdown

Related Papers

Tweets

https://twitter.com/Underfox3/status/1752559680306504096

HackerNews

A Scalable RISC-V Vector Processor Enabling Efficient Multi-Precision Inference (3 points, 0 comments)