Low-Precision Mixed-Computation Models for Inference on Edge (2312.02210v1)
Abstract: This paper presents a mixed-computation neural network processing approach for edge applications that incorporates low-precision (low-width) Posit and low-precision fixed point (FixP) number systems. This mixed-computation approach employs 4-bit Posit (Posit4), which has higher precision around zero, for representing weights with high sensitivity, while it uses 4-bit FixP (FixP4) for representing other weights. A heuristic for analyzing the importance and the quantization error of the weights is presented to assign the proper number system to different weights. Additionally, a gradient approximation for Posit representation is introduced to improve the quality of weight updates in the backpropagation process. Due to the high energy consumption of the fully Posit-based computations, neural network operations are carried out in FixP or Posit/FixP. An efficient hardware implementation of a MAC operation with a first Posit operand and FixP for a second operand and accumulator is presented. The efficacy of the proposed low-precision mixed-computation approach is extensively assessed on vision and LLMs. The results show that, on average, the accuracy of the mixed-computation is about 1.5% higher than that of FixP with a cost of 0.19% energy overhead.
- A. Krizhevsky, I. Sutskever, and G. E. Hinton, “ImageNet classification with deep convolutional neural networks,” in Advances in Neural Information Processing Systems, 2012.
- K. He et al., “Deep residual learning for image recognition,” in Proc. of CVPR, 2016.
- A. Radford et al., “Language models are unsupervised multitask learners,” OpenAI blog, 2019.
- J. Devlin et al., “BERT: pre-training of deep bidirectional transformers for language understanding,” in Proc. of Conference of the North American Chapter of the Association for Computational Linguistics, 2019.
- B. Noune et al., “8-bit numerical formats for deep neural networks,” arXiv preprint arXiv:2206.02915, 2022.
- H. Yu et al., “Any-precision deep neural networks,” in Thirty-Fifth AAAI Conference on Artificial Intelligence, AAAI, 2021.
- N. Wang et al., “Training deep neural networks with 8-bit floating point numbers,” Advances in neural information processing systems, 2018.
- J. Choi et al., “PACT: parameterized clipping activation for quantized neural networks,” CoRR, vol. abs/1805.06085, 2018.
- J. L. Gustafson and I. T. Yonemoto, “Beating floating point at its own game: Posit arithmetic,” Supercomput. Front. Innov., 2017.
- S. Nambi et al., “Expan(n)d: Exploring posits for efficient artificial neural network design in fpga-based systems,” IEEE Access, 2021.
- J. Lu et al., “Evaluations on deep neural networks training using posit number system,” IEEE Transactions on Computers, 2020.
- Y. Nakahara et al., “A posit based multiply-accumulate unit with small quire size for deep neural networks,” IPSJ Trans. Syst. LSI Des. Methodol., vol. 15, 2022.
- R. Murillo et al., “PLAM: A posit logarithm-approximate multiplier,” IEEE Trans. Emerg. Top. Comput., pp. 2079–2085, 2022.
- “Ieee standard for binary floating-point arithmetic,” ANSI/IEEE Std 754-1985, pp. 1–20, 1985.
- “Ieee standard for floating-point arithmetic,” IEEE Std 754-2019 (Revision of IEEE 754-2008), pp. 1–84, 2019.
- J. Gustafson, “Posit arithmetic,” Mathematica Notebook describing the posit number system, vol. 30, 2017.
- J. Lu et al., “Training deep neural networks using posit number system,” in 32nd IEEE International System-on-Chip Conference 2019,.
- P. Micikevicius et al., “Fp8 formats for deep learning,” arXiv preprint arXiv:2209.05433, 2022.
- Z. Dong et al., “Hawq: Hessian aware quantization of neural networks with mixed-precision,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019.
- R. Banner et al., “Scalable methods for 8-bit training of neural networks,” Advances in neural information processing systems, 2018.
- B. Noune et al., “8-bit numerical formats for deep neural networks,” CoRR, vol. abs/2206.02915, 2022.
- Z. Carmichael et al., “Performance-efficiency trade-off of low-precision numerical formats in deep neural networks,” in Proc. of the conf. for next generation arithmetic, 2019.
- H. F. Langroudi et al., “Cheetah: Mixed low-precision hardware & software co-design framework for dnns on the edge,” CoRR, 2019.
- Z. Carmichael et al., “Proc. of date,” 2019.
- R. Murillo, A. A. Del Barrio, and G. Botella, “Deep pensieve: A deep learning framework based on the posit number system,” Digital Signal Processing, 2020.
- D. Xie, J. Xiong, and S. Pu, “All you need is beyond a good init: Exploring better solution for training extremely deep convolutional neural networks with orthonormality and modulation,” in 2017 IEEE Conference on Computer Vision and Pattern Recognition.
- S. Azizi et al., “Sensitivity-aware mixed-precision quantization and width optimization of deep neural networks through cluster-based tree-structured parzen estimation,” CoRR, 2023.
- Z. Liu et al., “Nonuniform-to-uniform quantization: Towards accurate quantization via generalized straight-through estimation,” in IEEE Conference on Computer Vision and Pattern Recognition,, 2022.
- Y. Bengio, N. Léonard, and A. C. Courville, “Estimating or propagating gradients through stochastic neurons for conditional computation,” CoRR, 2013.
- B. Reagen et al., “Minerva: Enabling low-power, highly-accurate deep neural network accelerators,” in Proc. of ISCA, 2016.
- Y.-H. Chen et al., “Eyeriss v2: A flexible accelerator for emerging deep neural networks on mobile devices,” IEEE JESTCS, 2019.
- I. O. Tolstikhin et al., “Mlp-mixer: An all-mlp architecture for vision,” Advances in neural information processing systems, 2021.
Collections
Sign up for free to add this paper to one or more collections.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.