Towards Efficient SRAM-PIM Architecture Design by Exploiting Unstructured Bit-Level Sparsity (2404.09497v1)
Abstract: Bit-level sparsity in neural network models harbors immense untapped potential. Eliminating redundant calculations of randomly distributed zero-bits significantly boosts computational efficiency. Yet, traditional digital SRAM-PIM architecture, limited by rigid crossbar architecture, struggles to effectively exploit this unstructured sparsity. To address this challenge, we propose Dyadic Block PIM (DB-PIM), a groundbreaking algorithm-architecture co-design framework. First, we propose an algorithm coupled with a distinctive sparsity pattern, termed a dyadic block (DB), that preserves the random distribution of non-zero bits to maintain accuracy while restricting the number of these bits in each weight to improve regularity. Architecturally, we develop a custom PIM macro that includes dyadic block multiplication units (DBMUs) and Canonical Signed Digit (CSD)-based adder trees, specifically tailored for Multiply-Accumulate (MAC) operations. An input pre-processing unit (IPU) further refines performance and efficiency by capitalizing on block-wise input sparsity. Results show that our proposed co-design framework achieves a remarkable speedup of up to 7.69x and energy savings of 83.43%.
- Alex Krizhevsky et al. Imagenet Classification with Deep Convolutional Neural Networks. In Proceedings of the NIPS, 2012.
- Very Deep Convolutional Networks for End-to-End Speech Recognition. In Proceedings of the ICASSP, 2017.
- A Survey of Autonomous Driving: Common Practices and Emerging Technologies. IEEE ACCESS, 2020.
- An 89TOPS/W and 16.3 TOPS/mm22{}^{2}start_FLOATSUPERSCRIPT 2 end_FLOATSUPERSCRIPT All-Digital SRAM-based Full-Precision Compute-in Memory Macro in 22nm for Machine-Learning Edge Applications. In Proceedings of the ISSCC, 2021.
- DDC-PIM: Efficient Algorithm/Architecture Co-Design for Doubling Data Capacity of SRAM-based Processing-in-Memory. IEEE TCAD, 2024.
- Tzu-Hsien Yang et al. Sparse ReRAM Engine: Joint Exploration of Activation and Weight Sparsity in Compressed Neural Networks. In Proceedings of the ISCA, 2019.
- Bit-Transformer: Transforming Bit-Level Sparsity into Higher Preformance in ReRAM-based Accelerator. In Proceedings of the ICCAD, 2021.
- TCIM: Triangle Counting Acceleration with Processing-in-MRAM Architecture. In Proceedings of the DAC, 2020.
- Xuhang Chen et al. Accelerating Graph-Connected Component Computation with Emerging Processing-in-Memory Architecture. IEEE TCAD, 2022.
- NAND-SPIN-Based Processing-in-MRAM Architecture for Convolutional Neural Network Acceleration. SCIS, 2023.
- SDP: Co-Designing Algorithm, Dataflow, and Architecture for In-SRAM Sparse NN Acceleration. IEEE TCAD, 2022.
- A 2.75-to-75.9 TOPS/W Computing-in-Memory NN Processor Supporting Set-Associate Block-Wise Zero Skipping and Ping-Pong CIM with Simultaneous Computation and Weight Updating. In Proceedings of the ISSCC, 2021.
- A 28nm 53.8 TOPS/W 8b Sparse Transformer Accelerator with In-Memory Butterfly Zero Skipper for Unstructured-Pruned NN and CIM-based Local-Attention-Reusable Engine. In Proceedings of the ISSCC, 2023.
- MuITCIM: A 28nm 2.24 uJ/token Attention-Token-Bit Hybrid Sparse Digital CIM-based Accelerator for Multimodal Transformers. In Proceedings of the ISSCC, 2023.
- TT@ CIM: A Tensor-Train in-Memory-Computing Processor Using Bit-Level-Sparsity Optimization and Variable Precision Quantization. IEEE JSSC, 2022.
- Sampatrao L Pinjare et al. Implementation of Artificial Neural Network Architecture for Image Compression Using CSD Multiplier. In Proceedings of the ERCICA, 2013.
- A 1.041-Mb/mm22{}^{2}start_FLOATSUPERSCRIPT 2 end_FLOATSUPERSCRIPT 27.38-TOPS/W Signed-INT8 Dynamic-Logic-based ADC-Less SRAM Compute-in-Memory Macro in 28nm with Reconfigurable Bitwise Operation for AI and Embedded Applications. In Proceedings of the ISSCC, 2022.
- Very Deep Convolutional Networks for Large-Scale Image Recognition. arXiv preprint arXiv:1409.1556, 2014.
- Deep Residual Learning for Image Recognition. In Proceedings of the CVPR, 2016.
- MobileNetV2: Inverted Residuals and Linear Bottlenecks. In Proceedings of the CVPR, 2018.
- EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks. In Proceedings of the ICML, 2019.
Collections
Sign up for free to add this paper to one or more collections.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.