Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
149 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Sparse-Tuning: Adapting Vision Transformers with Efficient Fine-tuning and Inference (2405.14700v2)

Published 23 May 2024 in cs.CV

Abstract: Parameter-efficient fine-tuning (PEFT) has emerged as a popular solution for adapting pre-trained Vision Transformer (ViT) models to downstream applications. While current PEFT methods have achieved parameter efficiency, they overlook the efficiency of computation and GPU memory during both fine-tuning and inference, falling short of practical requirements. In this paper, we propose \textbf{Sparse-Tuning}, a novel PEFT method that accounts for the information redundancy in images and videos to boost the above efficiency. By sparsely preserving the semantic-relevant tokens and merging irrelevant ones, Sparse-Tuning minimizes the quantity of tokens processed at each layer, leading to a quadratic reduction in computational and memory overhead. To align our token sparsification strategy suitably with fine-tuning purposes, we further design Dense Adapters that establish dense connections from shallow layers to deeper layers. These Dense Adapters integrate multi-level local features to enrich the current tokens, improving both token preservation and model adaptation. Empirical results on VTAB-1K, three image datasets, and two video datasets show that our Sparse-Tuning reduces GFLOPs to \textbf{62\%-70\%} of the original ViT-B while achieving state-of-the-art performance. Source code is available at \url{https://github.com/liuting20/Sparse-Tuning}.

Citations (2)

Summary

We haven't generated a summary for this paper yet.

Github Logo Streamline Icon: https://streamlinehq.com
X Twitter Logo Streamline Icon: https://streamlinehq.com