Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
139 tokens/sec
GPT-4o
47 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

A 1.6-mW Sparse Deep Learning Accelerator for Speech Separation (2312.09580v1)

Published 15 Dec 2023 in cs.SD, cs.AR, and eess.AS

Abstract: Low power deep learning accelerators on the speech processing enable real-time applications on edge devices. However, most of the existing accelerators suffer from high power consumption and focus on image applications only. This paper presents a low power accelerator for speech separation through algorithm and hardware optimizations. At the algorithm level, the model is compressed with structured sensitivity as well as unstructured pruning, and further quantized to the shifted 8-bit floating-point format instead of the 32-bit floating-point format. The computations with the zero kernel and zero activation values are skipped by decomposition of the dilated and transposed convolutions. At the hardware level, the compressed model is then supported by an architecture with eight independent multipliers and accumulators (MACs) with a simple zero-skipping hardware to take advantage of the activation sparsity and low power processing. The proposed approach reduces the model size by 95.44\% and computation complexity by 93.88\%. The final implementation with the TSMC 40 $nm$ process can achieve real-time speech separation and consumes 1.6 mW power when operated at 150 MHz. The normalized energy efficiency and area efficiency are 2.344 TOPS/W and 14.42 GOPS/mm$2$, respectively.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (29)
  1. M. Mirbeygi, A. Mahabadi, and A. Ranjbar, “Speech and music separation approaches-a survey,” Multimedia Tools and Applications, pp. 1–43, 2022.
  2. V. Sze et al., “Efficient processing of deep neural networks: A tutorial and survey,” Proceedings of the IEEE, vol. 105, no. 12, pp. 2295–2329, 2017.
  3. Y.-H. Chen et al., “Eyeriss: An energy-efficient reconfigurable accelerator for deep convolutional neural networks,” IEEE J. Solid-State Circuits, vol. 52, no. 1, pp. 127–138, January 2016.
  4. Y.-H. Chen et al., “Eyeriss v2: A flexible accelerator for emerging deep neural networks on mobile devices,” IEEE Trans. Emerg. Sel. Topics Circuits Syst., vol. 9, no. 2, pp. 292–308, April 2019.
  5. K. Chang and T. Chang, “VWA: Hardware efficient vectorwise accelerator for convolutional neural network,” IEEE Trans. Circuits Syst. I, Reg. Papers, vol. 67, no. 1, pp. 145–154, October 2019.
  6. S.-F. Hsiao et al., “Design of a sparsity-aware reconfigurable deep learning accelerator supporting various types of operations,” IEEE Trans. Emerg. Sel. Topics Circuits Syst., vol. 10, no. 3, pp. 376–387, August 2020.
  7. Y.-C. Lee, T.-S. Chi, and C.-H. Yang, “A 2.17-mw acoustic dsp processor with cnn-fft accelerators for intelligent hearing assistive devices,” IEEE Journal of Solid-State Circuits, vol. 55, no. 8, pp. 2247–2258, 2020.
  8. Y. Luo and N. Mesgarani, “Conv-TasNet: Surpassing ideal time–frequency magnitude masking for speech separation,” IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 27, no. 8, pp. 1256–1266, 2019.
  9. B. Kadıoğlu et al., “An empirical study of Conv-Tasnet,” in ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2020, pp. 7264–7268.
  10. Y.-L. Shen, Y.-C. Lai, and T.-S. Chi, “Multi-resolution singing voice separation,” Submitted to IEEE Signal Processing Letters, 2022.
  11. M.-T. Chen, B.-J. Li, and T.-S. Chi, “CNN based two-stage multi-resolution end-to-end model for singing melody extraction,” in ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2019, pp. 1005–1009.
  12. L. Deng et al., “Model compression and hardware acceleration for neural networks: A comprehensive survey,” Proceedings of the IEEE, vol. 108, no. 4, pp. 485–532, 2020.
  13. K. Tan and D. Wang, “Towards model compression for deep learning based speech enhancement,” IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 29, pp. 1785–1794, 2021.
  14. I. Radosavovic et al., “Designing network design spaces,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2020, pp. 10 428–10 436.
  15. C. Ni et al., “LBFP: Logarithmic block floating point arithmetic for deep neural networks,” in 2020 IEEE Asia Pacific Conference on Circuits and Systems (APCCAS), 2020, pp. 201–204.
  16. L. Cambier et al., “Shifted and squeezed 8-bit floating point format for low-precision training of deep neural networks,” 2020.
  17. N. Shoghi et al., “Smaq: Smart quantization for dnn training by exploiting value clustering,” IEEE Computer Architecture Letters, vol. 20, no. 2, pp. 126–129, 2021.
  18. X. Sun et al., “Hybrid 8-bit floating point (HFP8) training and inference for deep neural networks,” Advances in neural information processing systems, vol. 32, 2019.
  19. K. Chang and T. Chang, “Efficient accelerator for dilated and transposed convolution with decomposition,” in International Symposium on Circuits and Systems (ISCAS), Sevilla, Spain, 2020, pp. 1–5.
  20. Z. Liu et al., “Rethinking the value of network pruning,” in International Conference on Learning Representations (ICLR), 2019.
  21. T. Tambe et al., “9.8 a 25mm 2 soc for iot devices with 18ms noise-robust speech-to-text latency via bayesian speech denoising and attention-based sequence-to-sequence dnn speech recognition in 16nm finfet,” in 2021 IEEE International Solid-State Circuits Conference (ISSCC), vol. 64.   IEEE, 2021, pp. 158–160.
  22. R. Krishnamoorthi, “Quantizing deep convolutional networks for efficient inference: A whitepaper,” CoRR, vol. abs/1806.08342, 2018. [Online]. Available: http://arxiv.org/abs/1806.08342
  23. C.-Y. Lin and B.-C. Lai, “Supporting compressed-sparse activations and weights on simd-like accelerator for sparse convolutional neural networks,” in 2018 23rd Asia and South Pacific Design Automation Conference (ASP-DAC), 2018, pp. 105–110.
  24. D. Kim, J. Ahn, and S. Yoo, “A novel zero weight/activation-aware hardware architecture of convolutional neural network,” in Design, Automation & Test in Europe Conference & Exhibition (DATE), 2017, 2017, pp. 1462–1467.
  25. S. Zhang et al., “Cambricon-x: An accelerator for sparse neural networks,” in 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), 2016, pp. 1–12.
  26. J. Albericio et al., “Cnvlutin: Ineffectual-neuron-free deep neural network computing,” in 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA), 2016, pp. 1–13.
  27. S. Han et al., “Eie: Efficient inference engine on compressed deep neural network,” in 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA), 2016, pp. 243–254.
  28. A. Liutkus et al., “The 2016 signal separation evaluation campaign,” vol. 10169, 02 2017, pp. 323–332.
  29. Y.-J. Lin et al., “A 1.5 mw programmable acoustic signal processor for hearing assistive devices with speech intelligibility enhancement,” IEEE Transactions on Circuits and Systems I: Regular Papers, vol. 67, no. 12, pp. 4984–4993, 2020.

Summary

We haven't generated a summary for this paper yet.