Papers
Topics
Authors
Recent
Search
2000 character limit reached

Investigating Transfer Learning Capabilities of Vision Transformers and CNNs by Fine-Tuning a Single Trainable Block

Published 11 Oct 2021 in cs.CV and cs.AI | (2110.05270v1)

Abstract: In recent developments in the field of Computer Vision, a rise is seen in the use of transformer-based architectures. They are surpassing the state-of-the-art set by CNN architectures in accuracy but on the other hand, they are computationally very expensive to train from scratch. As these models are quite recent in the Computer Vision field, there is a need to study it's transfer learning capabilities and compare it with CNNs so that we can understand which architecture is better when applied to real world problems with small data. In this work, we follow a simple yet restrictive method for fine-tuning both CNN and Transformer models pretrained on ImageNet1K on CIFAR-10 and compare them with each other. We only unfreeze the last transformer/encoder or last convolutional block of a model and freeze all the layers before it while adding a simple MLP at the end for classification. This simple modification lets us use the raw learned weights of both these neural networks. From our experiments, we find out that transformers-based architectures not only achieve higher accuracy than CNNs but some transformers even achieve this feat with around 4 times lesser number of parameters.

Citations (5)

Summary

Paper to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.