Papers

Topics

Authors

Recent

View all

Gemini 2.5 Flash

97 tokens/sec

GPT-4o

11 tokens/sec

Gemini 2.5 Pro Pro

47 tokens/sec

o3 Pro

5 tokens/sec

GPT-4.1 Pro

38 tokens/sec

DeepSeek R1 via Azure Pro

28 tokens/sec

2000 character limit reached

xTern: Energy-Efficient Ternary Neural Network Inference on RISC-V-Based Edge Systems (2405.19065v1)

Published 29 May 2024 in cs.AR and cs.LG

Abstract: Ternary neural networks (TNNs) offer a superior accuracy-energy trade-off compared to binary neural networks. However, until now, they have required specialized accelerators to realize their efficiency potential, which has hindered widespread adoption. To address this, we present xTern, a lightweight extension of the RISC-V instruction set architecture (ISA) targeted at accelerating TNN inference on general-purpose cores. To complement the ISA extension, we developed a set of optimized kernels leveraging xTern, achieving 67% higher throughput than their 2-bit equivalents. Power consumption is only marginally increased by 5.2%, resulting in an energy efficiency improvement by 57.1%. We demonstrate that the proposed xTern extension, integrated into an octa-core compute cluster, incurs a minimal silicon area overhead of 0.9% with no impact on timing. In end-to-end benchmarks, we demonstrate that xTern enables the deployment of TNNs achieving up to 1.6 percentage points higher CIFAR-10 classification accuracy than 2-bit networks at equal inference latency. Our results show that xTern enables RISC-V-based ultra-low-power edge AI platforms to benefit from the efficiency potential of TNNs.

References (30)

Summary

The paper introduces a novel RISC-V ISA extension—xTern—that integrates lightweight ternary neural network instructions for efficient edge inference.
It implements packed-SIMD MADD and specialized threshold-and-compress instructions, achieving up to 67% improved throughput with minimal hardware overhead.
End-to-end tests on CIFAR-10 and DVS gesture tasks demonstrate reduced latency and energy consumption while maintaining competitive accuracy.

xTern: Energy-Efficient Ternary Neural Network Inference on RISC-V-Based Edge Systems

The paper "xTern: Energy-Efficient Ternary Neural Network Inference on RISC-V-Based Edge Systems" presents an innovative approach to enhancing the efficiency of Ternary Neural Networks (TNNs) for edge AI applications. This work addresses the challenge of integrating TNN inference capabilities into widely-used, general-purpose RISC-V architecture without resorting to specialized hardware accelerators, which have thus far limited broader adoption due to cost and design complexity.

Key Contributions

The authors introduce xTern, a minimalistic extension of the RISC-V instruction set architecture (ISA) aimed at facilitating efficient TNN inference on edge devices. Key contributions include:

ISA Extension: xTern adds a set of lightweight instructions tailored for TNN operations. It includes packed-SIMD multiply-add (MADD) instructions, element-wise comparison, and a stateful threshold-and-compress instruction.
Hardware Implementation: The xTern extension is incorporated into the RI5CY core, integrated into an eight-core cluster for evaluation. The modifications incur a minimal area overhead ( $<1\%$ ) with no impact on timing.
Performance Optimization: The optimized kernel library leveraging xTern demonstrates a $67\%$ increase in throughput for ternary convolutions compared to 2-bit kernels, along with a $57.1\%$ energy efficiency improvement.
End-to-End Evaluations: Deploying TNNs for CIFAR-10 image classification and DVS gesture recognition on the xTern-enabled system, the results show notable improvements in latency, energy consumption, and accuracy trade-offs.

Detailed Analysis

ISA Extension and Instructions

The novel xTern ISA extension is specifically crafted for efficient ternary operations. The extension includes:

MADD Instructions: These perform 20-way packed-SIMD multiply-accumulate operations, significantly boosting parallel computation efficiency.
Element-wise Comparison: Instructions like min.t and max.t facilitate compact and efficient implementation of max-pooling and similar functions.
Threshold-and-Compress Instruction (thrc): This stateful and highly optimized instruction maps convolutions and activations directly to ternary values, reducing operation latency and complexity.

Hardware Implementation

The xTern extension was synthesized in a GlobalFoundries 22nm FDX process, demonstrating negligible area overhead. The core area increased by only $3\%$ , and the overall cluster area by $0.9\%$ . Importantly, the system's maximum operating frequency remained unaffected, ensuring no performance degradation for non-TNN operations.

Power consumption analysis revealed a marginal increase of $5.2\%$ when running ternary convolutions compared to running 2-bit convolutions on the baseline, while achieving significant gains in throughput.

Performance and Efficiency

Testing focused on ternary convolution kernels indicated substantial performance benefits over 2-bit kernels from the PULP-NN library. The throughput increase averaged $67\%$ for ternary convolutions and reached up to $51\%$ for layers with channels comfortably divisible by five, confirming the efficacy of the ternary compression strategy.

Detailed latency breakdown revealed that xTern's optimizations significantly reduce the time for matrix multiplications and thresholding operations, components that typically dominate convolutional layer execution time.

End-to-End Network Inference

The evaluations on CIFAR-10 and DVS gesture recognition tasks underscore xTern's practical benefits. A VGG-like network for CIFAR-10 classification showed up to $1.6$ percentage points higher accuracy for ternary models at equivalent inference latency compared to 2-bit models. The energy efficiency also significantly improved.

In the DVS gesture recognition task, xTern-enabled TNNs exhibited considerable reductions in inference latency (up to $37.4\%$ ) and energy consumption ( $33\%$ ) compared to their 2-bit counterparts, with negligible accuracy losses.

Implications and Future Directions

The successful integration of xTern into a RISC-V processor demonstrates a feasible path for deploying efficient TNNs on edge devices. This work opens avenues for enhancing the computational capabilities of low-power MCUs commonly used in IoT applications. As edge AI grows, adept support for efficient ternary operations could bridge the gap between performance demands and hardware constraints, leading to more responsive, energy-conserving AI systems at the edge.

Future work could explore further optimization of ternary data handling and expanding the scope of xTern's ISA to other low-precision arithmetic operations. Additionally, the impact of xTern on different neural network architectures, including more complex models and various application domains beyond computer vision and gesture recognition, presents an interesting area for extended research.

Overall, xTern positions RISC-V as a competitive architecture for edge AI, reflecting a strategic blend of flexibility, efficiency, and minimal hardware overhead.

PDF Markdown

Tweets

https://twitter.com/pulp_platform/status/1796048361583219059

https://twitter.com/WWVY/status/1796030388143321377