Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
169 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Make RepVGG Greater Again: A Quantization-aware Approach (2212.01593v2)

Published 3 Dec 2022 in cs.CV

Abstract: The tradeoff between performance and inference speed is critical for practical applications. Architecture reparameterization obtains better tradeoffs and it is becoming an increasingly popular ingredient in modern convolutional neural networks. Nonetheless, its quantization performance is usually too poor to deploy (more than 20% top-1 accuracy drop on ImageNet) when INT8 inference is desired. In this paper, we dive into the underlying mechanism of this failure, where the original design inevitably enlarges quantization error. We propose a simple, robust, and effective remedy to have a quantization-friendly structure that also enjoys reparameterization benefits. Our method greatly bridges the gap between INT8 and FP32 accuracy for RepVGG. Without bells and whistles, the top-1 accuracy drop on ImageNet is reduced within 2% by standard post-training quantization. Moreover, our method also achieves similar FP32 performance as RepVGG. Extensive experiments on detection and semantic segmentation tasks verify its generalization.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (51)
  1. Lsq+: Improving low-bit quantization through learnable offsets and better initialization. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 696–697.
  2. Zeroq: A novel zero shot quantization framework. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 13169–13178.
  3. Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE transactions on pattern analysis and machine intelligence, 40(4): 834–848.
  4. Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation. In ECCV.
  5. The cityscapes dataset for semantic urban scene understanding. In Proceedings of the IEEE conference on computer vision and pattern recognition, 3213–3223.
  6. Analysis of the quantization error in digital multipliers with small wordlength. In 2016 24th European Signal Processing Conference (EUSIPCO), 1848–1852. IEEE.
  7. Imagenet: A large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition, 248–255. Ieee.
  8. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), 4171–4186.
  9. Re-parameterizing Your Optimizers rather than Architectures. In The Eleventh International Conference on Learning Representations.
  10. Acnet: Strengthening the kernel skeletons for powerful cnn via asymmetric convolution blocks. In Proceedings of the IEEE/CVF international conference on computer vision, 1911–1920.
  11. Scaling up your kernels to 31x31: Revisiting large kernel design in cnns. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 11963–11975.
  12. Repvgg: Making vgg-style convnets great again. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 13733–13742. https://github.com/DingXiaoH/RepVGG.git, hashtag: 5c2e359a144726b9d14cba1e455bf540eaa54afc.
  13. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. In International Conference on Learning Representations.
  14. Speech recognition with deep recurrent neural networks. In 2013 IEEE international conference on acoustics, speech and signal processing, 6645–6649. Ieee.
  15. Deep learning with limited numerical precision. In International conference on machine learning, 1737–1746. PMLR.
  16. Ristretto: A framework for empirical study of resource-efficient inference in convolutional neural networks. IEEE transactions on neural networks and learning systems, 29(11): 5784–5789.
  17. HPTQ: Hardware-Friendly Post Training Quantization. arXiv preprint arXiv:2109.09113.
  18. Mask r-cnn. In Proceedings of the IEEE international conference on computer vision, 2961–2969.
  19. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, 770–778.
  20. Searching for mobilenetv3. In Proceedings of the IEEE/CVF international conference on computer vision, 1314–1324.
  21. Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861.
  22. Online Convolutional Re-parameterization. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 568–577.
  23. DyRep: Bootstrapping Training with Dynamic Re-parameterization. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 588–597.
  24. Real-Time Intermediate Flow Estimation for Video Frame Interpolation. In Proceedings of the European Conference on Computer Vision (ECCV).
  25. Efficient and accurate quantized image super-resolution on mobile NPUs, mobile AI & AIM 2022 challenge: report. In Computer Vision–ECCV 2022 Workshops: Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part III, 92–129. Springer.
  26. Batch normalization: Accelerating deep network training by reducing internal covariate shift. In International conference on machine learning, 448–456. PMLR.
  27. Quantization and training of neural networks for efficient integer-arithmetic-only inference. In Proceedings of the IEEE conference on computer vision and pattern recognition, 2704–2713.
  28. Krishnamoorthi, R. 2018. Quantizing deep convolutional networks for efficient inference: A whitepaper. arXiv preprint arXiv:1806.08342.
  29. Learning multiple layers of features from tiny images.
  30. YOLOv6: a single-stage object detection framework for industrial applications. arXiv preprint arXiv:2209.02976. https://github.com/meituan/YOLOv6.git, hashtag: 05da1477671017ac2edbb709e09c75854a7b4eb1.
  31. Microsoft coco: Common objects in context. In European conference on computer vision, 740–755. Springer.
  32. Fully convolutional networks for semantic segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition, 3431–3440.
  33. Rectifier nonlinearities improve neural network acoustic models. In ICML, volume 30, 3. Atlanta, Georgia, USA.
  34. Data-free quantization through weight equalization and bias correction. In Proceedings of the IEEE/CVF International Conference on Computer Vision, 1325–1334.
  35. Rectified linear units improve restricted boltzmann machines. In ICML.
  36. NVIDIA. 2018. TensorRT. https://developer.nvidia.com/tensorrt.
  37. You only look once: Unified, real-time object detection. In Proceedings of the IEEE conference on computer vision and pattern recognition, 779–788.
  38. Mobilenetv2: Inverted residuals and linear bottlenecks. In Proceedings of the IEEE conference on computer vision and pattern recognition, 4510–4520.
  39. A quantization-friendly separable convolution for mobilenets. In 2018 1st Workshop on Energy Efficient Machine Learning and Cognitive Computing for Embedded Applications (EMC2), 14–18. IEEE.
  40. Efficientnet: Rethinking model scaling for convolutional neural networks. In International Conference on Machine Learning, 6105–6114. PMLR.
  41. An Improved One millisecond Mobile Backbone. arXiv preprint arXiv:2206.04040.
  42. Attention is all you need. Advances in neural information processing systems, 30.
  43. YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. arXiv preprint arXiv:2207.02696.
  44. Haq: Hardware-aware automated quantization with mixed precision. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 8612–8620.
  45. Mixed precision quantization of convnets via differentiable neural architecture search. arXiv preprint arXiv:1812.00090.
  46. MemSR: Training Memory-efficient Lightweight Model for Image Super-Resolution. In Chaudhuri, K.; Jegelka, S.; Song, L.; Szepesvari, C.; Niu, G.; and Sabato, S., eds., Proceedings of the 39th International Conference on Machine Learning, volume 162 of Proceedings of Machine Learning Research, 24076–24092. PMLR.
  47. PP-YOLOE: An evolved version of YOLO. arXiv preprint arXiv:2203.16250.
  48. Do All MobileNets Quantize Poorly? Gaining Insights into the Effect of Quantization on Depthwise Separable Convolutional Networks Through the Eyes of Multi-scale Distributional Dynamics. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2447–2456.
  49. Diracnets: Training very deep neural networks without skip-connections. arXiv preprint arXiv:1706.00388.
  50. Edge-oriented Convolution Block for Real-time Super Resolution on Mobile Devices. In Proceedings of the 29th ACM International Conference on Multimedia, 4034–4043.
  51. FastPillars: A Deployment-friendly Pillar-based 3D Detector. arXiv preprint arXiv:2302.02367.
Citations (35)

Summary

We haven't generated a summary for this paper yet.

Youtube Logo Streamline Icon: https://streamlinehq.com