Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 173 tok/s
Gemini 2.5 Pro 49 tok/s Pro
GPT-5 Medium 34 tok/s Pro
GPT-5 High 37 tok/s Pro
GPT-4o 95 tok/s Pro
Kimi K2 205 tok/s Pro
GPT OSS 120B 433 tok/s Pro
Claude Sonnet 4.5 38 tok/s Pro
2000 character limit reached

MABViT -- Modified Attention Block Enhances Vision Transformers (2312.01324v2)

Published 3 Dec 2023 in cs.CV and cs.LG

Abstract: Recent studies have demonstrated the effectiveness of Gated Linear Units (GLU) in enhancing transformer models, particularly in LLMs. Additionally, utilizing a parallel configuration within each Transformer block rather than the conventional serialized method has been revealed to accelerate the training of LLMs without significantly impacting performance. However, when the MLP and attention block were run in parallel for the image classification task, we observed a noticeable decline in performance. We propose a novel transformer variant that integrates non-linearity within the attention block to tackle this problem. We implemented the GLU-based activation function on the Value tensor, and this new technique surpasses the current state-of-the-art S/16 variant of Vision Transformers by 0.6% on the ImageNet-1K dataset while utilizing fewer parameters. It also supersedes the B/16 variant while using only half the parameters. Furthermore, we provide results with the GELU activation function variant to confirm our assertions. Lastly, we showcase that the MABViT variants exhibit greater potential when utilized in deep transformers compared to the standard architecture.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (22)
  1. Big Vision. https://github.com/google-research/big˙vision.
  2. PaLM: Scaling Language Modeling with Pathways. arXiv:2204.02311v5.
  3. Randaugment: Practical Automated Data Augmentation with a Reduced Search Space. In Advances in Neural Information Processing Systems, volume 33, 18613–18624.
  4. Language Modeling with Gated Convolutional Networks.
  5. Scaling Vision Transformers to 22 Billion Parameters.
  6. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv:2010.11929.
  7. Deep residual learning for image recognition. arXiv preprint arXiv:1512.03385.
  8. Improving Transformer Optimization Through Better Initialization. In International Conference on Machine Learning, 4475–4483. PMLR.
  9. Dual PatchNorm. arXiv, 2302: 2302.01327.
  10. Understanding the Difficulty of Training Transformers.
  11. Shazeer, N. 2020. GLU Variants Improve Transformer.
  12. Talking-Heads Attention.
  13. NormFormer: Improved Transformer Pretraining with Extra Normalization. arXiv, 2110: 2110.09456.
  14. How to Train Your ViT? Data, Augmentation, and Regularization in Vision Transformers. Transactions on Machine Learning Research.
  15. Going Deeper with Image Transformers. In Proceedings of the IEEE/CVF International Conference on Computer Vision, 6144–6153. IEEE.
  16. Attention Is All You Need. arXiv:1706.03762.
  17. Deepnet: Scaling Transformers to 1,000 Layers.
  18. GPT-J-6B: A 6 Billion Parameter Autoregressive Language Model. https://github.com/kingoflolz/mesh-transformer-jax. May 2021.
  19. ResiDual: Transformer with Dual Residual Connections. arXiv:2304.14802.
  20. On Layer Normalization in the Transformer Architecture. In Proceedings of the 35th International Conference on Machine Learning, 12126–12135. PMLR.
  21. Scaling Vision Transformers. Conference on Computer Vision and Pattern Recognition (CVPR).
  22. Mixup: Beyond Empirical Risk Minimization.
Citations (2)

Summary

We haven't generated a summary for this paper yet.

Dice Question Streamline Icon: https://streamlinehq.com

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Lightbulb Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.