Sense: Model Hardware Co-design for Accelerating Sparse CNN on Systolic Array (2202.00389v2)

Published 1 Feb 2022 in cs.AR

Abstract: Sparsity is an intrinsic property of convolutional neural network(CNN) and worth exploiting for CNN accelerators, but extra processing comes with hardware overhead, causing many architectures suffering from only minor profit. Meanwhile, systolic array has been increasingly competitive on CNNs acceleration for its high spatiotemporal locality and low hardware overhead. However, the irregularity of sparsity induces imbalanced workload under the rigid systolic dataflow, causing performance degradation. Thus, this paper proposed a systolicarray-based architecture, called Sense, for sparse CNN acceleration by model-hardware co-design, achieving large performance improvement. To balance input feature map(IFM) and weight loads across Processing Element(PE) array, we applied channel clustering to gather IFMs with approximate sparsity for array computation, and co-designed a load-balancing weight pruning method to keep the sparsity ratio of each kernel at a certain value with little accuracy loss, improving PE utilization and overall performance. Additionally, Adaptive Dataflow Configuration is applied to determine the computing strategy based on the storage ratio of IFMs and weights, lowering 1.17x-1.8x DRAM access compared with Swallow and further reducing system energy consumption. The whole design is implemented on ZynqZCU102 with 200MHz and performs at 471-, 34-, 53- and 191-image/s for AlexNet, VGG-16, ResNet-50 and GoogleNet respectively. Compared against sparse systolic-array-based accelerators, Swallow, FESA and SPOTS, Sense achieves 1x-2.25x, 1.95x-2.5x and 1.17x-2.37x performance improvement on these CNNs respectively with reasonable overhead.

Citations (11)

View on Semantic Scholar

Summary

We haven't generated a summary for this paper yet.

Summarize Now

Sense: Model Hardware Co-design for Accelerating Sparse CNN on Systolic Array (2202.00389v2)

Summary

Related Papers