Papers
Topics
Authors
Recent
2000 character limit reached

Towards Efficient In-memory Computing Hardware for Quantized Neural Networks: State-of-the-art, Open Challenges and Perspectives

Published 8 Jul 2023 in cs.AR, cs.AI, and cs.ET | (2307.03936v1)

Abstract: The amount of data processed in the cloud, the development of Internet-of-Things (IoT) applications, and growing data privacy concerns force the transition from cloud-based to edge-based processing. Limited energy and computational resources on edge push the transition from traditional von Neumann architectures to In-memory Computing (IMC), especially for machine learning and neural network applications. Network compression techniques are applied to implement a neural network on limited hardware resources. Quantization is one of the most efficient network compression techniques allowing to reduce the memory footprint, latency, and energy consumption. This paper provides a comprehensive review of IMC-based Quantized Neural Networks (QNN) and links software-based quantization approaches to IMC hardware implementation. Moreover, open challenges, QNN design requirements, recommendations, and perspectives along with an IMC-based QNN hardware roadmap are provided.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (72)
  1. A. S. Andrae, “New perspectives on internet electricity use in 2030,” Engineering and Applied Science Letters, vol. 3, no. 2, pp. 19–31, 2020.
  2. A. Gholami, S. Kim, Z. Dong, Z. Yao, M. W. Mahoney, and K. Keutzer, “A survey of quantization methods for efficient neural network inference,” arXiv preprint arXiv:2103.13630, 2021.
  3. Y. Guo, “A survey on methods and theories of quantized neural networks,” arXiv preprint arXiv:1808.04752, 2018.
  4. J. Wang, X. Wang, C. Eckert, A. Subramaniyan, R. Das, D. Blaauw, and D. Sylvester, “A 28-nm compute sram with bit-serial logic/arithmetic operations for programmable in-memory vector computing,” IEEE Journal of Solid-State Circuits, vol. 55, no. 1, pp. 76–86, 2019.
  5. A. Sebastian, M. Le Gallo, R. Khaddam-Aljameh, and E. Eleftheriou, “Memory devices and applications for in-memory computing,” Nature nanotechnology, vol. 15, no. 7, pp. 529–544, 2020.
  6. D. Ielmini and G. Pedretti, “Device and circuit architectures for in-memory computing,” Advanced Intelligent Systems, vol. 2, no. 7, p. 2000040, 2020.
  7. G. Krishnan, S. K. Mandal, M. Pannala, C. Chakrabarti, J.-S. Seo, U. Y. Ogras, and Y. Cao, “Siam: Chiplet-based scalable in-memory acceleration with mesh for deep neural networks,” ACM Transactions on Embedded Computing Systems (TECS), vol. 20, no. 5s, pp. 1–24, 2021.
  8. O. Krestinskaya, L. Zhang, and K. N. Salama, “Towards efficient rram-based quantized neural networks hardware: State-of-the-art and open issues,” in 2022 IEEE 22nd International Conference on Nanotechnology (NANO).   IEEE, 2022, pp. 465–468.
  9. I. Chakraborty, M. Ali, A. Ankit, S. Jain, S. Roy, S. Sridharan, A. Agrawal, A. Raghunathan, and K. Roy, “Resistive crossbars as approximate hardware building blocks for machine learning: Opportunities and challenges,” Proceedings of the IEEE, vol. 108, no. 12, pp. 2276–2310, 2020.
  10. M. A. Zidan, H. A. H. Fahmy, M. M. Hussain, and K. N. Salama, “Memristor-based memory: The sneak paths problem and solutions,” Microelectronics journal, vol. 44, no. 2, pp. 176–183, 2013.
  11. M. E. Fouda, S. Lee, J. Lee, G. H. Kim, F. Kurdahi, and A. M. Eltawi, “Ir-qnn framework: An ir drop-aware offline training of quantized crossbar arrays,” IEEE Access, vol. 8, pp. 228 392–228 408, 2020.
  12. X. Sun, W. Khwa, Y. Chen, C. Lee, H. Lee, S. Yu, R. Naous, J. Wu, T. Chen, X. Bao et al., “Pcm-based analog compute-in-memory: impact of device non-idealities on inference accuracy,” IEEE Transactions on Electron Devices, vol. 68, no. 11, pp. 5585–5591, 2021.
  13. O. Krestinskaya, A. Irmanova, and A. P. James, “Memristive non-idealities: Is there any practical implications for designing neural network chips?” in 2019 IEEE International Symposium on Circuits and Systems (ISCAS).   IEEE, 2019, pp. 1–5.
  14. L. Zhang, S. Cosemans, D. J. Wouters, G. Groeseneken, M. Jurczak, and B. Govoreanu, “Selector design considerations and requirements for 1 sir rram crossbar array,” in 2014 IEEE 6th International Memory Workshop (IMW).   IEEE, 2014, pp. 1–4.
  15. Y. Gong, L. Liu, M. Yang, and L. D. Bourdev, “Compressing deep convolutional networks using vector quantization,” CoRR, vol. abs/1412.6115, 2014. [Online]. Available: http://arxiv.org/abs/1412.6115
  16. C.-F. Teng, C.-H. D. Wu, A. K.-S. Ho, and A.-Y. A. Wu, “Low-complexity recurrent neural network-based polar decoder with weight quantization mechanism,” in ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).   IEEE, 2019, pp. 1413–1417.
  17. T. Chu, Q. Luo, J. Yang, and X. Huang, “Mixed-precision quantized neural networks with progressively decreasing bitwidth,” Pattern Recognition, vol. 111, p. 107647, 2021.
  18. N. Kim, D. Shin, W. Choi, G. Kim, and J. Park, “Exploiting retraining-based mixed-precision quantization for low-cost dnn accelerator design,” IEEE Transactions on Neural Networks and Learning Systems, vol. 32, no. 7, pp. 2925–2938, 2020.
  19. D. T. Nguyen, H. Kim, and H.-J. Lee, “Layer-specific optimization for mixed data flow with mixed precision in fpga design for cnn-based object detectors,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 31, no. 6, pp. 2450–2464, 2020.
  20. H. V. Habi, R. H. Jennings, and A. Netzer, “Hmq: Hardware friendly mixed precision quantization block for cnns,” in European Conference on Computer Vision.   Springer, 2020, pp. 448–463.
  21. A. Elthakeb, P. Pilligundla, F. Mireshghallah, A. Yazdanbakhsh, S. Gao, and H. Esmaeilzadeh, “Releq: an automatic reinforcement learning approach for deep quantization of neural networks,” in NeurIPS ML for Systems workshop, 2018, 2019.
  22. M. Courbariaux, Y. Bengio, and J.-P. David, “Binaryconnect: Training deep neural networks with binary weights during propagations,” Advances in neural information processing systems, vol. 28, 2015.
  23. M. Spallanzani, L. Cavigelli, G. P. Leonardi, M. Bertogna, and L. Benini, “Additive noise annealing and approximation properties of quantized neural networks,” arXiv preprint arXiv:1905.10452, 2019.
  24. Y. Bai, Y.-X. Wang, and E. Liberty, “Proxquant: Quantized neural networks via proximal operators,” arXiv preprint arXiv:1810.00861, 2018.
  25. J. Yang, X. Shen, J. Xing, X. Tian, H. Li, B. Deng, J. Huang, and X.-s. Hua, “Quantization networks,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, pp. 7308–7316.
  26. Y. Yilmaz and P. Mazumder, “A drift-tolerant read/write scheme for multilevel memristor memory,” IEEE Transactions on Nanotechnology, vol. 16, no. 6, pp. 1016–1027, 2017.
  27. A. Ciprut and E. G. Friedman, “Modeling size limitations of resistive crossbar array with cell selectors,” IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vol. 25, no. 1, pp. 286–293, 2016.
  28. A. Ankit, I. E. Hajj, S. R. Chalamalasetti, G. Ndu, M. Foltin, R. S. Williams, P. Faraboschi, W.-m. W. Hwu, J. P. Strachan, K. Roy et al., “Puma: A programmable ultra-efficient memristor-based accelerator for machine learning inference,” in Proceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems, 2019, pp. 715–731.
  29. X. Peng, R. Liu, and S. Yu, “Optimizing weight mapping and data flow for convolutional neural networks on rram based processing-in-memory architecture,” in 2019 IEEE International Symposium on Circuits and Systems (ISCAS).   IEEE, 2019, pp. 1–5.
  30. S. Negi, I. Chakraborty, A. Ankit, and K. Roy, “Nax: Neural architecture and memristive xbar based accelerator co-design,” in Proceedings of the 59th ACM/IEEE Design Automation Conference, ser. DAC ’22.   New York, NY, USA: Association for Computing Machinery, 2022, p. 451–456. [Online]. Available: https://doi.org/10.1145/3489517.3530476
  31. W.-S. Khwa, J.-J. Chen, J.-F. Li, X. Si, E.-Y. Yang, X. Sun, R. Liu, P.-Y. Chen, Q. Li, S. Yu et al., “A 65nm 4kb algorithm-dependent computing-in-memory sram unit-macro with 2.3 ns and 55.8 tops/w fully parallel product-sum operation for binary dnn edge processors,” in 2018 IEEE International Solid-State Circuits Conference-(ISSCC).   IEEE, 2018, pp. 496–498.
  32. C. Yu, T. Yoo, T. T.-H. Kim, K. C. T. Chuan, and B. Kim, “A 16k current-based 8t sram compute-in-memory macro with decoupled read/write and 1-5bit column adc,” in 2020 IEEE Custom Integrated Circuits Conference (CICC).   IEEE, 2020, pp. 1–4.
  33. X. Sun, R. Liu, X. Peng, and S. Yu, “Computing-in-memory with sram and rram for binary neural networks,” in 2018 14th IEEE International Conference on Solid-State and Integrated Circuit Technology (ICSICT).   IEEE, 2018, pp. 1–4.
  34. X. Sun, X. Peng, P.-Y. Chen, R. Liu, J.-s. Seo, and S. Yu, “Fully parallel rram synaptic array for implementing binary neural network with (+ 1,- 1) weights and (+ 1, 0) neurons,” in 2018 23rd Asia and South Pacific Design Automation Conference (ASP-DAC).   IEEE, 2018, pp. 574–579.
  35. S. Yin, X. Sun, S. Yu, and J.-s. Seo, “High-throughput in-memory computing for binary deep neural networks with monolithically integrated rram and 90-nm cmos,” IEEE Transactions on Electron Devices, vol. 67, no. 10, pp. 4185–4192, 2020.
  36. P. Deaville, B. Zhang, L.-Y. Chen, and N. Verma, “A maximally row-parallel mram in-memory-computing macro addressing readout circuit sensitivity and area,” in ESSCIRC 2021-IEEE 47th European Solid State Circuits Conference (ESSCIRC).   IEEE, 2021, pp. 75–78.
  37. K. Ando, K. Ueyoshi, K. Orimo, H. Yonekawa, S. Sato, H. Nakahara, S. Takamaeda-Yamazaki, M. Ikebe, T. Asai, T. Kuroda et al., “Brein memory: A single-chip binary/ternary reconfigurable in-memory deep neural network accelerator achieving 1.4 tops at 0.6 w,” IEEE Journal of Solid-State Circuits, vol. 53, no. 4, pp. 983–994, 2017.
  38. A. Laborieux, M. Bocquet, T. Hirtzlin, J.-O. Klein, L. H. Diez, E. Nowak, E. Vianello, J.-M. Portal, and D. Querlioz, “Low power in-memory implementation of ternary neural networks with resistive ram-based synapse,” in 2020 2nd IEEE International Conference on Artificial Intelligence Circuits and Systems (AICAS).   IEEE, 2020, pp. 136–140.
  39. S. Zhu, L. H. Duong, H. Chen, D. Liu, and W. Liu, “Fat: An in-memory accelerator with fast addition for ternary weight neural networks,” IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 2022.
  40. D. Saito, T. Kobayashi, H. Koga, N. Ronchi, K. Banerjee, Y. Shuto, J. Okuno, K. Konishi, L. Di Piazza, A. Mallik et al., “Analog in-memory computing in fefet-based 1t1r array for edge ai applications,” in 2021 Symposium on VLSI Technology.   IEEE, 2021, pp. 1–2.
  41. J. Yang, Y. Kong, Z. Wang, Y. Liu, B. Wang, S. Yin, and L. Shi, “24.4 sandwich-ram: An energy-efficient in-memory bwn architecture with pulse-width modulation,” in 2019 IEEE International Solid-State Circuits Conference-(ISSCC).   IEEE, 2019, pp. 394–396.
  42. Q. Dong, M. E. Sinangil, B. Erbagci, D. Sun, W.-S. Khwa, H.-J. Liao, Y. Wang, and J. Chang, “15.3 a 351tops/w and 372.4 gops compute-in-memory sram macro in 7nm finfet cmos for machine-learning applications,” in 2020 IEEE International Solid-State Circuits Conference-(ISSCC).   IEEE, 2020, pp. 242–244.
  43. J.-W. Su, X. Si, Y.-C. Chou, T.-W. Chang, W.-H. Huang, Y.-N. Tu, R. Liu, P.-J. Lu, T.-W. Liu, J.-H. Wang et al., “Two-way transpose multibit 6t sram computing-in-memory macro for inference-training ai edge chips,” IEEE Journal of Solid-State Circuits, vol. 57, no. 2, pp. 609–624, 2021.
  44. Z. Zhang, J.-J. Chen, X. Si, Y.-N. Tu, J.-W. Su, W.-H. Huang, J.-H. Wang, W.-C. Wei, Y.-C. Chiu, J.-M. Hong et al., “A 55nm 1-to-8 bit configurable 6t sram based computing-in-memory unit-macro for cnn-based ai edge processors,” in 2019 IEEE Asian Solid-State Circuits Conference (A-SSCC).   IEEE, 2019, pp. 217–218.
  45. J. Yue, X. Feng, Y. He, Y. Huang, Y. Wang, Z. Yuan, M. Zhan, J. Liu, J.-W. Su, Y.-L. Chung et al., “15.2 a 2.75-to-75.9 tops/w computing-in-memory nn processor supporting set-associate block-wise zero skipping and ping-pong cim with simultaneous computation and weight updating,” in 2021 IEEE International Solid-State Circuits Conference (ISSCC), vol. 64.   IEEE, 2021, pp. 238–240.
  46. R. Guo, Z. Yue, X. Si, T. Hu, H. Li, L. Tang, Y. Wang, L. Liu, M.-F. Chang, Q. Li et al., “15.4 a 5.99-to-691.1 tops/w tensor-train in-memory-computing processor using bit-level-sparsity-based optimization and variable-precision quantization,” in 2021 IEEE International Solid-State Circuits Conference (ISSCC), vol. 64.   IEEE, 2021, pp. 242–244.
  47. H. Fujiwara, H. Mori, W.-C. Zhao, M.-C. Chuang, R. Naous, C.-K. Chuang, T. Hashizume, D. Sun, C.-F. Lee, K. Akarvardar et al., “A 5-nm 254-tops/w 221-tops/mm 2 fully-digital computing-in-memory macro supporting wide-range dynamic-voltage-frequency scaling and simultaneous mac and write operations,” in 2022 IEEE International Solid-State Circuits Conference (ISSCC), vol. 65.   IEEE, 2022, pp. 1–3.
  48. Z. Wang, C. Li, P. Lin, M. Rao, Y. Nie, W. Song, Q. Qiu, Y. Li, P. Yan, J. P. Strachan et al., “In situ training of feed-forward and recurrent convolutional memristor networks,” Nature Machine Intelligence, vol. 1, no. 9, pp. 434–442, 2019.
  49. P. Yao, H. Wu, B. Gao, J. Tang, Q. Zhang, W. Zhang, J. J. Yang, and H. Qian, “Fully hardware-implemented memristor convolutional neural network,” Nature, vol. 577, no. 7792, pp. 641–646, 2020.
  50. A. Shafiee, A. Nag, N. Muralimanohar, R. Balasubramonian, J. P. Strachan, M. Hu, R. S. Williams, and V. Srikumar, “Isaac: A convolutional neural network accelerator with in-situ analog arithmetic in crossbars,” ACM SIGARCH Computer Architecture News, vol. 44, no. 3, pp. 14–26, 2016.
  51. L. Song, X. Qian, H. Li, and Y. Chen, “Pipelayer: A pipelined reram-based accelerator for deep learning,” in 2017 IEEE international symposium on high performance computer architecture (HPCA).   IEEE, 2017, pp. 541–552.
  52. W. Li, S. Huang, X. Sun, H. Jiang, and S. Yu, “Secure-rram: A 40nm 16kb compute-in-memory macro with reconfigurability, sparsity control, and embedded security,” in 2021 IEEE Custom Integrated Circuits Conference (CICC).   IEEE, 2021, pp. 1–2.
  53. Z. Zhu, H. Sun, Y. Lin, G. Dai, L. Xia, S. Han, Y. Wang, and H. Yang, “A configurable multi-precision cnn computing framework based on single bit rram,” in 2019 56th ACM/IEEE Design Automation Conference (DAC).   IEEE, 2019, pp. 1–6.
  54. R. Khaddam-Aljameh, M. Stanisavljevic, J. F. Mas, G. Karunaratne, M. Braendli, F. Liu, A. Singh, S. M. Müller, U. Egger, A. Petropoulos et al., “Hermes core–a 14nm cmos and pcm-based in-memory compute core using an array of 300ps/lsb linearized cco-based adcs and local digital processing,” in 2021 Symposium on VLSI Circuits.   IEEE, 2021, pp. 1–2.
  55. W.-S. Khwa, Y.-C. Chiu, C.-J. Jhang, S.-P. Huang, C.-Y. Lee, T.-H. Wen, F.-C. Chang, S.-M. Yu, T.-Y. Lee, and M.-F. Chang, “A 40-nm, 2m-cell, 8b-precision, hybrid slc-mlc pcm computing-in-memory macro with 20.5-65.0 tops/w for tiny-al edge devices,” in 2022 IEEE International Solid-State Circuits Conference (ISSCC), vol. 65.   IEEE, 2022, pp. 1–3.
  56. C. Matsui, K. Toprasertpong, S. Takagi, and K. Takeuchi, “Energy-efficient reliable hzo fefet computation-in-memory with local multiply & global accumulate array for source-follower & charge-sharing voltage sensing,” in 2021 Symposium on VLSI Technology.   IEEE, 2021, pp. 1–2.
  57. S. Xie, C. Ni, P. Jain, F. Hamzaoglu, and J. P. Kulkarni, “Gain-cell cim: Leakage and bitline swing aware 2t1c gain-cell edram compute in memory design with bitline precharge dacs and compact schmitt trigger adcs,” in 2022 IEEE Symposium on VLSI Technology and Circuits (VLSI Technology and Circuits).   IEEE, 2022, pp. 112–113.
  58. S. Xie, C. Ni, A. Sayal, P. Jain, F. Hamzaoglu, and J. P. Kulkarni, “16.2 edram-cim: Compute-in-memory design with reconfigurable embedded-dynamic-memory array realizing adaptive data converters and charge-domain computing,” in 2021 IEEE International Solid-State Circuits Conference (ISSCC), vol. 64.   IEEE, 2021, pp. 248–250.
  59. M. Kim, M. Liu, L. R. Everson, and C. H. Kim, “An embedded nand flash-based compute-in-memory array demonstrated in a standard logic process,” IEEE Journal of Solid-State Circuits, vol. 57, no. 2, pp. 625–638, 2021.
  60. Q. Liu, B. Gao, P. Yao, D. Wu, J. Chen, Y. Pang, W. Zhang, Y. Liao, C.-X. Xue, W.-H. Chen et al., “33.2 a fully integrated analog reram based 78.4 tops/w compute-in-memory chip with fully parallel mac computing,” in 2020 ieee international solid-state circuits conference-(isscc).   IEEE, 2020, pp. 500–502.
  61. A. Ankit, I. El Hajj, S. R. Chalamalasetti, S. Agarwal, M. Marinella, M. Foltin, J. P. Strachan, D. Milojicic, W.-M. Hwu, and K. Roy, “Panther: A programmable architecture for neural network training harnessing energy-efficient reram,” IEEE Transactions on Computers, vol. 69, no. 8, pp. 1128–1142, 2020.
  62. W. Zhang, B. Gao, J. Tang, P. Yao, S. Yu, M.-F. Chang, H.-J. Yoo, H. Qian, and H. Wu, “Neuro-inspired computing chips,” Nature electronics, vol. 3, no. 7, pp. 371–382, 2020.
  63. A. James, Y. Toleubay, O. Krestinskaya, and C. Reghuvaran, “Inference dropouts in binary weighted analog memristive crossbar,” IEEE Transactions on Nanotechnology, vol. 21, pp. 271–277, 2022.
  64. J. Chen, L. Liu, Y. Liu, and X. Zeng, “A learning framework for n-bit quantized neural networks toward fpgas,” IEEE Transactions on Neural Networks and Learning Systems, vol. 32, no. 3, pp. 1067–1081, 2020.
  65. W. Haensch, A. Raghunathan, K. Roy, B. Chakrabart, C. M. Phatak, C. Wang, and S. Guha, “A co-design view of compute in-memory with non-volatile elements for neural networks,” arXiv preprint arXiv:2206.08735, 2022.
  66. W. Jung, D. Jung, B. Kim, S. Lee, W. Rhee, and J. H. Ahn, “Restructuring batch normalization to accelerate cnn training,” Proceedings of Machine Learning and Systems, vol. 1, pp. 14–26, 2019.
  67. S. Huang, A. Ankit, P. Silveira, R. Antunes, S. R. Chalamalasetti, I. El Hajj, D. E. Kim, G. Aguiar, P. Bruel, S. Serebryakov et al., “Mixed precision quantization for reram-based dnn inference accelerators,” in 2021 26th Asia and South Pacific Design Automation Conference (ASP-DAC).   IEEE, 2021, pp. 372–377.
  68. O. Krestinskaya, K. N. Salama, and A. P. James, “Automating analogue ai chip design with genetic search,” Advanced Intelligent Systems, vol. 2, no. 8, p. 2000075, 2020.
  69. J. Peng, H. Liu, Z. Zhao, Z. Li, S. Liu, and Q. Li, “Cmq: Crossbar-aware neural network mixed-precision quantization via differentiable architecture search,” IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 2022.
  70. H. Sun, C. Wang, Z. Zhu, X. Ning, G. Dai, H. Yang, and Y. Wang, “Gibbon: efficient co-exploration of nn model and processing-in-memory architecture,” in 2022 Design, Automation & Test in Europe Conference & Exhibition (DATE).   IEEE, 2022, pp. 867–872.
  71. X. Peng, S. Huang, Y. Luo, X. Sun, and S. Yu, “Dnn+ neurosim: An end-to-end benchmarking framework for compute-in-memory accelerators with versatile device technologies,” in 2019 IEEE international electron devices meeting (IEDM).   IEEE, 2019, pp. 32–5.
  72. M. Chang, S. D. Spetalnick, B. Crafton, W.-S. Khwa, Y.-D. Chih, M.-F. Chang, and A. Raychowdhury, “A 40nm 60.64 tops/w ecc-capable compute-in-memory/digital 2.25 mb/768kb rram/sram system with embedded cortex m3 microprocessor for edge recommendation systems,” in 2022 IEEE International Solid-State Circuits Conference (ISSCC), vol. 65.   IEEE, 2022, pp. 1–3.
Citations (4)

Summary

Paper to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.