xTern: Energy-Efficient Ternary Neural Network Inference on RISC-V-Based Edge Systems (2405.19065v1)
Abstract: Ternary neural networks (TNNs) offer a superior accuracy-energy trade-off compared to binary neural networks. However, until now, they have required specialized accelerators to realize their efficiency potential, which has hindered widespread adoption. To address this, we present xTern, a lightweight extension of the RISC-V instruction set architecture (ISA) targeted at accelerating TNN inference on general-purpose cores. To complement the ISA extension, we developed a set of optimized kernels leveraging xTern, achieving 67% higher throughput than their 2-bit equivalents. Power consumption is only marginally increased by 5.2%, resulting in an energy efficiency improvement by 57.1%. We demonstrate that the proposed xTern extension, integrated into an octa-core compute cluster, incurs a minimal silicon area overhead of 0.9% with no impact on timing. In end-to-end benchmarks, we demonstrate that xTern enables the deployment of TNNs achieving up to 1.6 percentage points higher CIFAR-10 classification accuracy than 2-bit networks at equal inference latency. Our results show that xTern enables RISC-V-based ultra-low-power edge AI platforms to benefit from the efficiency potential of TNNs.
- R. David, J. Duke, A. Jain, V. Janapa Reddi, N. Jeffries, J. Li, N. Kreeger, I. Nappier, M. Natraj, T. Wang, P. Warden, and R. Rhodes, “TensorFlow Lite Micro: Embedded Machine Learning for TinyML Systems,” in Proceedings of Machine Learning and Systems, A. Smola, A. Dimakis, and I. Stoica, Eds., vol. 3, 2021, pp. 800–811.
- A. Howard, M. Sandler, B. Chen, W. Wang, L.-C. Chen, M. Tan, G. Chu, V. Vasudevan, Y. Zhu, R. Pang, H. Adam, and Q. Le, “Searching for MobileNetV3,” in 2019 IEEE/CVF International Conference on Computer Vision (ICCV). IEEE, Oct. 2019, pp. 1314–1324.
- M. Tan and Q. V. Le, “EfficientNet: Rethinking model scaling for convolutional neural networks,” in 36th International Conference on Machine Learning, ICML 2019, Jun. 2019, pp. 10 691–10 700.
- J. Lin, W.-M. Chen, H. Cai, C. Gan, and S. han, “Memory-Efficient Patch-Based Inference for Tiny Deep Learning,” in Advances in Neural Information Processing Systems, A. Beygelzimer, Y. Dauphin, P. Liang, and J. W. Vaughan, Eds., Dec. 2021.
- M. Scherer, G. Rutishauser, L. Cavigelli, and L. Benini, “CUTIE: Beyond PetaOp/s/W Ternary DNN Inference Acceleration With Better-Than-Binary Energy Efficiency,” IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 41, no. 4, pp. 1020–1033, Apr. 2022.
- B. Moons, D. Bankman, L. Yang, B. Murmann, and M. Verhelst, “BinarEye: An always-on energy-accuracy-scalable binary CNN processor with all memory on chip in 28nm CMOS,” in 2018 IEEE Custom Integrated Circuits Conference (CICC). IEEE, Apr. 2018, pp. 1–4.
- Y. Umuroglu, L. Rasnayake, and M. Sjalander, “BISMO: A Scalable Bit-Serial Matrix Multiplication Overlay for Reconfigurable Computing,” in 2018 28th International Conference on Field Programmable Logic and Applications (FPL). IEEE, Aug. 2018, pp. 307–3077.
- S. Ma, H. Wang, L. Ma, L. Wang, W. Wang, S. Huang, L. Dong, R. Wang, J. Xue, and F. Wei, “The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits,” arXiv preprint arXiv:2402.17764, pp. 1–8, Feb. 2024.
- O. Muller, A. Prost-Boucle, A. Bourge, and F. Petrot, “Efficient decompression of binary encoded balanced ternary sequences,” IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vol. 27, no. 8, pp. 1962–1966, 2019.
- S. R. Jain, A. Gural, M. Wu, and C. H. Dick, “Trained Quantization Thresholds for Accurate and Efficient Fixed-Point Inference of Deep Neural Networks,” in Proceedings of Machine Learning and Systems, 2020, pp. 112–128.
- C. Banbury, V. J. Reddi, P. Torelli, J. Holleman, N. Jeffries, C. Kiraly, P. Montino, D. Kanter, S. Ahmed, D. Pau, and Others, “MLPerf Tiny Benchmark,” Proceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks, 2021.
- A. Krizhevsky, “Learning Multiple Layers of Features from Tiny Images,” 2009.
- H. Valavi, P. J. Ramadge, E. Nestler, and N. Verma, “A Mixed-Signal Binarized Convolutional-Neural-Network Accelerator Integrating Dense Weight Storage and Multiplication for Reduced Data Movement,” in 2018 IEEE Symposium on VLSI Circuits. IEEE, Jun. 2018, pp. 141–142.
- S. Jain, S. K. Gupta, and A. Raghunathan, “TiM-DNN: Ternary In-Memory Accelerator for Deep Neural Networks,” IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vol. 28, no. 7, pp. 1567–1577, Jul. 2020.
- P. Jokic, S. Emery, and L. Benini, “Battery-Less Face Recognition at the Extreme Edge,” IEEE New Circuits Syst. Conf., pp. 2–5, 2021.
- G. Rutishauser, M. Scherer, T. Fischer, and L. Benini, “7 µJ/inference end-to-end gesture recognition from dynamic vision sensor data using ternarized hybrid convolutional neural networks,” Future Generation Computer Systems, p. 109231, Jul. 2023.
- M. Rastegari, V.-n. Ordonez, J. Redmon, and A. Farhadi, “XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks,” in Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2016, vol. 9908 LNCS, pp. 525–542.
- F. Sakr, R. Berta, J. Doyle, H. Younes, A. De Gloria, and F. Bellotti, “Memory Efficient Binary Convolutional Neural Networks on Microcontrollers,” Proceedings - IEEE International Conference on Edge Computing, vol. 2022-July, pp. 169–177, 2022.
- L. Geiger and P. Team, “Larq: An Open-Source Library for Training Binarized Neural Networks,” Journal of Open Source Software, vol. 5, no. 45, p. 1746, 2020.
- G. Cerutti, L. Cavigelli, R. Andri, M. Magno, E. Farella, and L. Benini, “Sub-mW Keyword Spotting on an MCU: Analog Binary Feature Extraction and Binary Neural Networks,” IEEE Transactions on Circuits and Systems I: Regular Papers, vol. 69, no. 5, pp. 2002–2012, 2022.
- S. K. Esser, J. L. McKinstry, D. Bablani, R. Appuswamy, and D. S. Modha, “Learned Step Size Quantization,” in International Conference on Learning Representations, Feb. 2020, pp. 1–12.
- J. Choi, S. Venkataramani, V. V. Srinivasan, K. Gopalakrishnan, Z. Wang, and P. Chuang, “Accurate and Efficient 2-bit Quantized Neural Networks,” in Proceedings of Machine Learning and Systems, A. Talwalkar, V. Smith, and M. Zaharia, Eds., vol. 1, 2019, pp. 348–359.
- D. Metz, V. Kumar, and M. Själander, “BISDU: A Bit-Serial Dot-Product Unit for Microcontrollers,” ACM Transactions on Embedded Computing Systems, Jul. 2023.
- A. Garofalo, G. Tagliavini, F. Conti, L. Benini, and D. Rossi, “XpulpNN: Enabling Energy Efficient and Flexible Inference of Quantized Neural Networks on RISC-V Based IoT End Nodes,” IEEE Transactions on Emerging Topics in Computing, vol. 9, no. 3, pp. 1489–1505, Jul. 2021.
- M. Gautschi, P. D. Schiavone, A. Traber, I. Loi, A. Pullini, D. Rossi, E. Flamand, F. K. Gurkaynak, and L. Benini, “Near-Threshold RISC-V Core With DSP Extensions for Scalable IoT Endpoint Devices,” IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vol. 25, no. 10, pp. 2700–2713, Oct. 2017.
- M. Scherer, M. Eggimann, A. D. Mauro, A. S. Prasad, F. Conti, D. Rossi, J. T. Gómez, Z. Li, S. S. Sarwar, Z. Wang, B. D. Salvo, and L. Benini, “Siracusa: A Low-Power On-Sensor RISC-V SoC for Extended Reality Visual Processing in 16nm CMOS,” in 49th European Solid State Circuits Conference (ESSCIRC). IEEE, Sep. 2023, pp. 217–220.
- A. Garofalo, M. Rusci, F. Conti, D. Rossi, and L. Benini, “Pulp-NN: Accelerating quantized neural networks on parallel ultra-low-power RISC-V processors,” Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences, vol. 378, no. 2164, 2020.
- M. Rusci, A. Capotondi, and L. Benini, “Memory-Driven Mixed Low Precision Quantization for Enabling Deep Network Inference on Microcontrollers,” in Proceedings of Machine Learning and Systems, I. Dhillon, D. Papailiopoulos, and V. Sze, Eds., vol. 2, 2020, pp. 326–335.
- A. Burrello, A. Garofalo, N. Bruschi, G. Tagliavini, D. Rossi, and F. Conti, “DORY: Automatic End-to-End Deployment of Real-World DNNs on Low-Cost IoT MCUs,” IEEE Transactions on Computers, vol. 70, no. 8, pp. 1253–1268, Aug. 2021.
- A. Amir, B. Taba, D. Berg, T. Melano, J. McKinstry, C. Di Nolfo, T. Nayak, A. Andreopoulos, G. Garreau, M. Mendoza, J. Kusnitz, M. Debole, S. Esser, T. Delbruck, M. Flickner, and D. Modha, “A Low Power, Fully Event-Based Gesture Recognition System,” in 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), vol. 27, no. 36. IEEE, Jul. 2017, pp. 7388–7397.