Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
11 tokens/sec
Gemini 2.5 Pro Pro
47 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

xTern: Energy-Efficient Ternary Neural Network Inference on RISC-V-Based Edge Systems (2405.19065v1)

Published 29 May 2024 in cs.AR and cs.LG

Abstract: Ternary neural networks (TNNs) offer a superior accuracy-energy trade-off compared to binary neural networks. However, until now, they have required specialized accelerators to realize their efficiency potential, which has hindered widespread adoption. To address this, we present xTern, a lightweight extension of the RISC-V instruction set architecture (ISA) targeted at accelerating TNN inference on general-purpose cores. To complement the ISA extension, we developed a set of optimized kernels leveraging xTern, achieving 67% higher throughput than their 2-bit equivalents. Power consumption is only marginally increased by 5.2%, resulting in an energy efficiency improvement by 57.1%. We demonstrate that the proposed xTern extension, integrated into an octa-core compute cluster, incurs a minimal silicon area overhead of 0.9% with no impact on timing. In end-to-end benchmarks, we demonstrate that xTern enables the deployment of TNNs achieving up to 1.6 percentage points higher CIFAR-10 classification accuracy than 2-bit networks at equal inference latency. Our results show that xTern enables RISC-V-based ultra-low-power edge AI platforms to benefit from the efficiency potential of TNNs.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (30)
  1. R. David, J. Duke, A. Jain, V. Janapa Reddi, N. Jeffries, J. Li, N. Kreeger, I. Nappier, M. Natraj, T. Wang, P. Warden, and R. Rhodes, “TensorFlow Lite Micro: Embedded Machine Learning for TinyML Systems,” in Proceedings of Machine Learning and Systems, A. Smola, A. Dimakis, and I. Stoica, Eds., vol. 3, 2021, pp. 800–811.
  2. A. Howard, M. Sandler, B. Chen, W. Wang, L.-C. Chen, M. Tan, G. Chu, V. Vasudevan, Y. Zhu, R. Pang, H. Adam, and Q. Le, “Searching for MobileNetV3,” in 2019 IEEE/CVF International Conference on Computer Vision (ICCV).   IEEE, Oct. 2019, pp. 1314–1324.
  3. M. Tan and Q. V. Le, “EfficientNet: Rethinking model scaling for convolutional neural networks,” in 36th International Conference on Machine Learning, ICML 2019, Jun. 2019, pp. 10 691–10 700.
  4. J. Lin, W.-M. Chen, H. Cai, C. Gan, and S. han, “Memory-Efficient Patch-Based Inference for Tiny Deep Learning,” in Advances in Neural Information Processing Systems, A. Beygelzimer, Y. Dauphin, P. Liang, and J. W. Vaughan, Eds., Dec. 2021.
  5. M. Scherer, G. Rutishauser, L. Cavigelli, and L. Benini, “CUTIE: Beyond PetaOp/s/W Ternary DNN Inference Acceleration With Better-Than-Binary Energy Efficiency,” IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 41, no. 4, pp. 1020–1033, Apr. 2022.
  6. B. Moons, D. Bankman, L. Yang, B. Murmann, and M. Verhelst, “BinarEye: An always-on energy-accuracy-scalable binary CNN processor with all memory on chip in 28nm CMOS,” in 2018 IEEE Custom Integrated Circuits Conference (CICC).   IEEE, Apr. 2018, pp. 1–4.
  7. Y. Umuroglu, L. Rasnayake, and M. Sjalander, “BISMO: A Scalable Bit-Serial Matrix Multiplication Overlay for Reconfigurable Computing,” in 2018 28th International Conference on Field Programmable Logic and Applications (FPL).   IEEE, Aug. 2018, pp. 307–3077.
  8. S. Ma, H. Wang, L. Ma, L. Wang, W. Wang, S. Huang, L. Dong, R. Wang, J. Xue, and F. Wei, “The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits,” arXiv preprint arXiv:2402.17764, pp. 1–8, Feb. 2024.
  9. O. Muller, A. Prost-Boucle, A. Bourge, and F. Petrot, “Efficient decompression of binary encoded balanced ternary sequences,” IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vol. 27, no. 8, pp. 1962–1966, 2019.
  10. S. R. Jain, A. Gural, M. Wu, and C. H. Dick, “Trained Quantization Thresholds for Accurate and Efficient Fixed-Point Inference of Deep Neural Networks,” in Proceedings of Machine Learning and Systems, 2020, pp. 112–128.
  11. C. Banbury, V. J. Reddi, P. Torelli, J. Holleman, N. Jeffries, C. Kiraly, P. Montino, D. Kanter, S. Ahmed, D. Pau, and Others, “MLPerf Tiny Benchmark,” Proceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks, 2021.
  12. A. Krizhevsky, “Learning Multiple Layers of Features from Tiny Images,” 2009.
  13. H. Valavi, P. J. Ramadge, E. Nestler, and N. Verma, “A Mixed-Signal Binarized Convolutional-Neural-Network Accelerator Integrating Dense Weight Storage and Multiplication for Reduced Data Movement,” in 2018 IEEE Symposium on VLSI Circuits.   IEEE, Jun. 2018, pp. 141–142.
  14. S. Jain, S. K. Gupta, and A. Raghunathan, “TiM-DNN: Ternary In-Memory Accelerator for Deep Neural Networks,” IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vol. 28, no. 7, pp. 1567–1577, Jul. 2020.
  15. P. Jokic, S. Emery, and L. Benini, “Battery-Less Face Recognition at the Extreme Edge,” IEEE New Circuits Syst. Conf., pp. 2–5, 2021.
  16. G. Rutishauser, M. Scherer, T. Fischer, and L. Benini, “7 µJ/inference end-to-end gesture recognition from dynamic vision sensor data using ternarized hybrid convolutional neural networks,” Future Generation Computer Systems, p. 109231, Jul. 2023.
  17. M. Rastegari, V.-n. Ordonez, J. Redmon, and A. Farhadi, “XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks,” in Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2016, vol. 9908 LNCS, pp. 525–542.
  18. F. Sakr, R. Berta, J. Doyle, H. Younes, A. De Gloria, and F. Bellotti, “Memory Efficient Binary Convolutional Neural Networks on Microcontrollers,” Proceedings - IEEE International Conference on Edge Computing, vol. 2022-July, pp. 169–177, 2022.
  19. L. Geiger and P. Team, “Larq: An Open-Source Library for Training Binarized Neural Networks,” Journal of Open Source Software, vol. 5, no. 45, p. 1746, 2020.
  20. G. Cerutti, L. Cavigelli, R. Andri, M. Magno, E. Farella, and L. Benini, “Sub-mW Keyword Spotting on an MCU: Analog Binary Feature Extraction and Binary Neural Networks,” IEEE Transactions on Circuits and Systems I: Regular Papers, vol. 69, no. 5, pp. 2002–2012, 2022.
  21. S. K. Esser, J. L. McKinstry, D. Bablani, R. Appuswamy, and D. S. Modha, “Learned Step Size Quantization,” in International Conference on Learning Representations, Feb. 2020, pp. 1–12.
  22. J. Choi, S. Venkataramani, V. V. Srinivasan, K. Gopalakrishnan, Z. Wang, and P. Chuang, “Accurate and Efficient 2-bit Quantized Neural Networks,” in Proceedings of Machine Learning and Systems, A. Talwalkar, V. Smith, and M. Zaharia, Eds., vol. 1, 2019, pp. 348–359.
  23. D. Metz, V. Kumar, and M. Själander, “BISDU: A Bit-Serial Dot-Product Unit for Microcontrollers,” ACM Transactions on Embedded Computing Systems, Jul. 2023.
  24. A. Garofalo, G. Tagliavini, F. Conti, L. Benini, and D. Rossi, “XpulpNN: Enabling Energy Efficient and Flexible Inference of Quantized Neural Networks on RISC-V Based IoT End Nodes,” IEEE Transactions on Emerging Topics in Computing, vol. 9, no. 3, pp. 1489–1505, Jul. 2021.
  25. M. Gautschi, P. D. Schiavone, A. Traber, I. Loi, A. Pullini, D. Rossi, E. Flamand, F. K. Gurkaynak, and L. Benini, “Near-Threshold RISC-V Core With DSP Extensions for Scalable IoT Endpoint Devices,” IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vol. 25, no. 10, pp. 2700–2713, Oct. 2017.
  26. M. Scherer, M. Eggimann, A. D. Mauro, A. S. Prasad, F. Conti, D. Rossi, J. T. Gómez, Z. Li, S. S. Sarwar, Z. Wang, B. D. Salvo, and L. Benini, “Siracusa: A Low-Power On-Sensor RISC-V SoC for Extended Reality Visual Processing in 16nm CMOS,” in 49th European Solid State Circuits Conference (ESSCIRC).   IEEE, Sep. 2023, pp. 217–220.
  27. A. Garofalo, M. Rusci, F. Conti, D. Rossi, and L. Benini, “Pulp-NN: Accelerating quantized neural networks on parallel ultra-low-power RISC-V processors,” Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences, vol. 378, no. 2164, 2020.
  28. M. Rusci, A. Capotondi, and L. Benini, “Memory-Driven Mixed Low Precision Quantization for Enabling Deep Network Inference on Microcontrollers,” in Proceedings of Machine Learning and Systems, I. Dhillon, D. Papailiopoulos, and V. Sze, Eds., vol. 2, 2020, pp. 326–335.
  29. A. Burrello, A. Garofalo, N. Bruschi, G. Tagliavini, D. Rossi, and F. Conti, “DORY: Automatic End-to-End Deployment of Real-World DNNs on Low-Cost IoT MCUs,” IEEE Transactions on Computers, vol. 70, no. 8, pp. 1253–1268, Aug. 2021.
  30. A. Amir, B. Taba, D. Berg, T. Melano, J. McKinstry, C. Di Nolfo, T. Nayak, A. Andreopoulos, G. Garreau, M. Mendoza, J. Kusnitz, M. Debole, S. Esser, T. Delbruck, M. Flickner, and D. Modha, “A Low Power, Fully Event-Based Gesture Recognition System,” in 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), vol. 27, no. 36.   IEEE, Jul. 2017, pp. 7388–7397.

Summary

  • The paper introduces a novel RISC-V ISA extension—xTern—that integrates lightweight ternary neural network instructions for efficient edge inference.
  • It implements packed-SIMD MADD and specialized threshold-and-compress instructions, achieving up to 67% improved throughput with minimal hardware overhead.
  • End-to-end tests on CIFAR-10 and DVS gesture tasks demonstrate reduced latency and energy consumption while maintaining competitive accuracy.

xTern: Energy-Efficient Ternary Neural Network Inference on RISC-V-Based Edge Systems

The paper "xTern: Energy-Efficient Ternary Neural Network Inference on RISC-V-Based Edge Systems" presents an innovative approach to enhancing the efficiency of Ternary Neural Networks (TNNs) for edge AI applications. This work addresses the challenge of integrating TNN inference capabilities into widely-used, general-purpose RISC-V architecture without resorting to specialized hardware accelerators, which have thus far limited broader adoption due to cost and design complexity.

Key Contributions

The authors introduce xTern, a minimalistic extension of the RISC-V instruction set architecture (ISA) aimed at facilitating efficient TNN inference on edge devices. Key contributions include:

  1. ISA Extension: xTern adds a set of lightweight instructions tailored for TNN operations. It includes packed-SIMD multiply-add (MADD) instructions, element-wise comparison, and a stateful threshold-and-compress instruction.
  2. Hardware Implementation: The xTern extension is incorporated into the RI5CY core, integrated into an eight-core cluster for evaluation. The modifications incur a minimal area overhead (<1%<1\%) with no impact on timing.
  3. Performance Optimization: The optimized kernel library leveraging xTern demonstrates a 67%67\% increase in throughput for ternary convolutions compared to 2-bit kernels, along with a 57.1%57.1\% energy efficiency improvement.
  4. End-to-End Evaluations: Deploying TNNs for CIFAR-10 image classification and DVS gesture recognition on the xTern-enabled system, the results show notable improvements in latency, energy consumption, and accuracy trade-offs.

Detailed Analysis

ISA Extension and Instructions

The novel xTern ISA extension is specifically crafted for efficient ternary operations. The extension includes:

  • MADD Instructions: These perform 20-way packed-SIMD multiply-accumulate operations, significantly boosting parallel computation efficiency.
  • Element-wise Comparison: Instructions like min.t and max.t facilitate compact and efficient implementation of max-pooling and similar functions.
  • Threshold-and-Compress Instruction (thrc): This stateful and highly optimized instruction maps convolutions and activations directly to ternary values, reducing operation latency and complexity.

Hardware Implementation

The xTern extension was synthesized in a GlobalFoundries 22nm FDX process, demonstrating negligible area overhead. The core area increased by only 3%3\%, and the overall cluster area by 0.9%0.9\%. Importantly, the system's maximum operating frequency remained unaffected, ensuring no performance degradation for non-TNN operations.

Power consumption analysis revealed a marginal increase of 5.2%5.2\% when running ternary convolutions compared to running 2-bit convolutions on the baseline, while achieving significant gains in throughput.

Performance and Efficiency

Testing focused on ternary convolution kernels indicated substantial performance benefits over 2-bit kernels from the PULP-NN library. The throughput increase averaged 67%67\% for ternary convolutions and reached up to 51%51\% for layers with channels comfortably divisible by five, confirming the efficacy of the ternary compression strategy.

Detailed latency breakdown revealed that xTern's optimizations significantly reduce the time for matrix multiplications and thresholding operations, components that typically dominate convolutional layer execution time.

End-to-End Network Inference

The evaluations on CIFAR-10 and DVS gesture recognition tasks underscore xTern's practical benefits. A VGG-like network for CIFAR-10 classification showed up to $1.6$ percentage points higher accuracy for ternary models at equivalent inference latency compared to 2-bit models. The energy efficiency also significantly improved.

In the DVS gesture recognition task, xTern-enabled TNNs exhibited considerable reductions in inference latency (up to 37.4%37.4\%) and energy consumption (33%33\%) compared to their 2-bit counterparts, with negligible accuracy losses.

Implications and Future Directions

The successful integration of xTern into a RISC-V processor demonstrates a feasible path for deploying efficient TNNs on edge devices. This work opens avenues for enhancing the computational capabilities of low-power MCUs commonly used in IoT applications. As edge AI grows, adept support for efficient ternary operations could bridge the gap between performance demands and hardware constraints, leading to more responsive, energy-conserving AI systems at the edge.

Future work could explore further optimization of ternary data handling and expanding the scope of xTern's ISA to other low-precision arithmetic operations. Additionally, the impact of xTern on different neural network architectures, including more complex models and various application domains beyond computer vision and gesture recognition, presents an interesting area for extended research.

Overall, xTern positions RISC-V as a competitive architecture for edge AI, reflecting a strategic blend of flexibility, efficiency, and minimal hardware overhead.