Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 155 tok/s
Gemini 2.5 Pro 42 tok/s Pro
GPT-5 Medium 34 tok/s Pro
GPT-5 High 31 tok/s Pro
GPT-4o 101 tok/s Pro
Kimi K2 213 tok/s Pro
GPT OSS 120B 422 tok/s Pro
Claude Sonnet 4.5 36 tok/s Pro
2000 character limit reached

Faster Inference of Integer SWIN Transformer by Removing the GELU Activation (2402.01169v1)

Published 2 Feb 2024 in cs.CV and cs.AI

Abstract: SWIN transformer is a prominent vision transformer model that has state-of-the-art accuracy in image classification tasks. Despite this success, its unique architecture causes slower inference compared with similar deep neural networks. Integer quantization of the model is one of the methods used to improve its inference latency. However, state-of-the-art has not been able to fully quantize the model. In this work, we improve upon the inference latency of the state-of-the-art methods by removing the floating-point operations, which are associated with the GELU activation in Swin Transformer. While previous work proposed to replace the non-integer operations with linear approximation functions, we propose to replace GELU with ReLU activation. The advantage of ReLU over previous methods is its low memory and computation complexity. We use iterative knowledge distillation to compensate for the lost accuracy due to replacing GELU with ReLU. We quantize our GELU-less SWIN transformer and show that on an RTX 4090 NVIDIA GPU we can improve the inference latency of the quantized SWIN transformer by at least $11\%$ while maintaining an accuracy drop of under $0.5\%$ on the ImageNet evaluation dataset.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (18)
  1. Imagenet: A large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition, 248–255. Ieee.
  2. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. In International Conference on Learning Representations.
  3. Fukushima, K. 1975. Cognitron: A self-organizing multilayered neural network. Biological cybernetics, 20(3-4): 121–136.
  4. I-BERT: Integer-only BERT Quantization. In Meila, M.; and Zhang, T., eds., Proceedings of the 38th International Conference on Machine Learning, volume 139 of Proceedings of Machine Learning Research, 5506–5518. PMLR.
  5. Q-ViT: Accurate and Fully Quantized Low-bit Vision Transformer. arXiv preprint arXiv:2210.06707.
  6. I-ViT: integer-only quantization for efficient vision transformer inference. arXiv preprint arXiv:2207.01405.
  7. Q-vit: Fully differentiable quantization for vision transformer. arXiv preprint arXiv:2201.07703.
  8. Towards fully 8-bit integer inference for the transformer model. arXiv preprint arXiv:2009.08034.
  9. Fq-vit: Post-training quantization for fully quantized vision transformer. arXiv preprint arXiv:2111.13824.
  10. Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE/CVF international conference on computer vision, 10012–10022.
  11. A convnet for the 2020s. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 11976–11986.
  12. Post-training quantization for vision transformer. Advances in Neural Information Processing Systems, 34: 28092–28103.
  13. MobileViT: Light-weight, General-purpose, and Mobile-friendly Vision Transformer. In International Conference on Learning Representations.
  14. Softermax: Hardware/software co-design of an efficient softmax for transformers. In 2021 58th ACM/IEEE Design Automation Conference (DAC), 469–474. IEEE.
  15. VAQF: fully automatic software-hardware co-design framework for low-bit vision transformer. arXiv preprint arXiv:2201.06618.
  16. Attention is all you need. Advances in neural information processing systems, 30.
  17. Towards efficient vision transformer inference: A first study of transformers on mobile devices. In Proceedings of the 23rd Annual International Workshop on Mobile Computing Systems and Applications, 1–7.
  18. Ptq4vit: Post-training quantization framework for vision transformers. arXiv preprint arXiv:2111.12293.

Summary

We haven't generated a summary for this paper yet.

Lightbulb Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets

This paper has been mentioned in 1 tweet and received 1 like.

Upgrade to Pro to view all of the tweets about this paper: