Cost-Effective Fault Tolerance for CNNs Using Parameter Vulnerability Based Hardening and Pruning (2405.10658v1)
Abstract: Convolutional Neural Networks (CNNs) have become integral in safety-critical applications, thus raising concerns about their fault tolerance. Conventional hardware-dependent fault tolerance methods, such as Triple Modular Redundancy (TMR), are computationally expensive, imposing a remarkable overhead on CNNs. Whereas fault tolerance techniques can be applied either at the hardware level or at the model levels, the latter provides more flexibility without sacrificing generality. This paper introduces a model-level hardening approach for CNNs by integrating error correction directly into the neural networks. The approach is hardware-agnostic and does not require any changes to the underlying accelerator device. Analyzing the vulnerability of parameters enables the duplication of selective filters/neurons so that their output channels are effectively corrected with an efficient and robust correction layer. The proposed method demonstrates fault resilience nearly equivalent to TMR-based correction but with significantly reduced overhead. Nevertheless, there exists an inherent overhead to the baseline CNNs. To tackle this issue, a cost-effective parameter vulnerability based pruning technique is proposed that outperforms the conventional pruning method, yielding smaller networks with a negligible accuracy loss. Remarkably, the hardened pruned CNNs perform up to 24\% faster than the hardened un-pruned ones.
- A. Krizhevsky et al., “Imagenet classification with deep convolutional neural networks,” Advances in neural information processing systems, vol. 25, 2012.
- S. Pouyanfar et al., “A survey on deep learning: Algorithms, techniques, and applications,” ACM Computing Surveys, vol. 51, no. 5, pp. 1–36, 2018.
- Y.-H. Chen et al., “Eyeriss: An energy-efficient reconfigurable accelerator for deep convolutional neural networks,” IEEE journal of solid-state circuits, vol. 52, no. 1, pp. 127–138, 2016.
- N. P. Jouppi et al., “In-datacenter performance analysis of a tensor processing unit,” in 44th annual international symposium on computer architecture, 2017, pp. 1–12.
- R. Canal et al., “Predictive reliability and fault management in exascale systems: State of the art and perspectives,” ACM Computing Surveys, vol. 53, no. 5, pp. 1–32, 2020.
- S. Mittal, “A survey on modeling and improving reliability of dnn algorithms and accelerators,” Journal of Systems Architecture, vol. 104, p. 101689, 2020.
- M. H. Ahmadilivani et al., “Enhancing fault resilience of qnns by selective neuron splitting,” in 2023 IEEE 5th AICAS, 2023, pp. 1–5.
- Y. Ibrahim et al., “Soft errors in dnn accelerators: A comprehensive review,” Microelectronics Reliability, vol. 115, p. 113969, 2020.
- F. Su et al., “Testability and dependability of ai hardware: Survey, trends, challenges, and perspectives,” IEEE Design & Test, 2023.
- M. H. Ahmadilivani et al., “A systematic literature review on hardware reliability assessment methods for deep neural networks,” ACM Comput. Surv., vol. 56, no. 6, jan 2024.
- M. H. Ahmadilivani, M. Taheri, J. Raik, M. Daneshtalab, and M. Jenihhin, “Deepvigor: Vulnerability value ranges and factors for dnns’ reliability assessment,” in 2023 IEEE European Test Symposium (ETS). IEEE, 2023, pp. 1–6.
- M. H. Ahmadilivani et al., “Special session: Reliability assessment recipes for dnn accelerators,” in 2024 VTS. IEEE, 2024.
- C. Schorn et al., “Accurate neuron resilience prediction for a flexible reliability management in neural network accelerators,” in 2018 DATE. IEEE, 2018, pp. 979–984.
- A. Ruospo and E. Sanchez, “On the reliability assessment of artificial neural networks running on ai-oriented mpsocs,” Applied Sciences, vol. 11, no. 14, p. 6455, 2021.
- M. Abdullah Hanif and M. Shafique, “Salvagednn: salvaging deep neural network accelerators with permanent faults through saliency-driven fault-aware mapping,” Philosophical Transactions of the Royal Society A, vol. 378, no. 2164, 2020.
- F. Libano et al., “Selective hardening for neural networks in fpgas,” IEEE Transactions on Nuclear Science, vol. 66, no. 1, pp. 216–222, 2018.
- G. Abich et al., “The impact of soft errors in memory units of edge devices executing convolutional neural networks,” IEEE Transactions on Circuits and Systems II: Express Briefs, vol. 69, no. 3, pp. 679–683, 2022.
- K. Zhao et al., “Ft-cnn: Algorithm-based fault tolerance for convolutional neural networks,” IEEE Transactions on Parallel and Distributed Systems, vol. 32, no. 7, pp. 1677–1689, 2020.
- J. J. Zhang et al., “Analyzing and mitigating the impact of permanent faults on a systolic array based neural network accelerator,” in 2018 IEEE 36th VTS. IEEE, 2018, pp. 1–6.
- K. T. Chitty-Venkata and A. K. Somani, “Model compression on faulty array-based neural network accelerator,” in 2020 IEEE 25th PRDC. IEEE, 2020, pp. 90–99.
- E. Ozen and A. Orailoglu, “Snr: S queezing n umerical r ange defuses bit error vulnerability surface in deep neural networks,” ACM Transactions on Embedded Computing Systems, vol. 20, no. 5s, pp. 1–25, 2021.
- A. Gholami, S. Kim, Z. Dong, Z. Yao, M. W. Mahoney, and K. Keutzer, “A survey of quantization methods for efficient neural network inference,” in Low-Power Computer Vision. Chapman and Hall/CRC, 2022, pp. 291–326.
- N. Cavagnero et al., “Fault-aware design and training to enhance dnns reliability with zero-overhead,” arXiv preprint arXiv:2205.14420, 2022.
- U. Zahid et al., “Fat: Training neural networks for reliable inference under hardware faults,” in 2020 IEEE ITC. IEEE, 2020, pp. 1–10.
- S. Lee and J. Yang, “Value-aware parity insertion ecc for fault-tolerant deep neural network,” in 2022 DATE. IEEE, 2022, pp. 724–729.
- Z. Chen et al., “A low-cost fault corrector for deep neural networks through range restriction,” in 2021 51st Annual IEEE/IFIP DSN. IEEE, 2021, pp. 1–13.
- L. Hoang et al., “Ft-clipact: Resilience analysis of deep neural networks and improving their fault tolerance using clipped activation,” in 2020 DATE. IEEE, 2020, pp. 1241–1246.
- B. Ghavami et al., “Fitact: Error resilient deep neural networks via fine-grained post-trainable activation functions,” in 2022 DATE. IEEE, 2022, pp. 1239–1244.
- M. S. Ali et al., “Erdnn: Error-resilient deep neural networks with a new error correction layer and piece-wise rectified linear unit,” IEEE Access, vol. 8, pp. 158 702–158 711, 2020.
- A. Mahmoud et al., “Hardnn: Feature map vulnerability evaluation in cnns,” arXiv preprint arXiv:2002.09786, 2020.
- S. Han, H. Mao, and W. J. Dally, “Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding,” arXiv preprint arXiv:1510.00149, 2015.