Parameter-Efficient Fine-Tuning for Large Models: A Comprehensive Survey (2403.14608v7)
Abstract: Large models represent a groundbreaking advancement in multiple application fields, enabling remarkable achievements across various tasks. However, their unprecedented scale comes with significant computational costs. These models, often consisting of billions of parameters, require vast amounts of computational resources for execution. Especially, the expansive scale and computational demands pose considerable challenges when customizing them for particular downstream tasks, particularly over the hardware platforms constrained by computational capabilities. Parameter Efficient Fine-Tuning (PEFT) provides a practical solution by efficiently adjusting the large models over the various downstream tasks. In particular, PEFT refers to the process of adjusting the parameters of a pre-trained large model to adapt it to a specific task or domain while minimizing the number of additional parameters introduced or computational resources required. This approach is particularly important when dealing with large-scale LLMs with high parameter counts, as fine-tuning these models from scratch can be computationally expensive and resource-intensive, posing considerable challenges in the supporting system platform design. In this survey, we present comprehensive studies of various PEFT algorithms, examining their performance and computational overhead. Moreover, we provide an overview of applications developed using different PEFT algorithms and discuss common techniques employed to mitigate computation costs for PEFT. In addition to providing an extensive survey from an algorithmic standpoint, we also examine various real-world system designs to investigate the implementation costs associated with different PEFT approaches. This survey serves as a valuable resource for researchers aiming to understand both the PEFT algorithm and its system implementation, offering detailed ......
- Intrinsic dimensionality explains the effectiveness of language model fine-tuning. arXiv preprint arXiv:2012.13255 (2020).
- Flamingo: a visual language model for few-shot learning. Advances in Neural Information Processing Systems 35 (2022), 23716–23736.
- Composable sparse fine-tuning for cross-lingual transfer. arXiv preprint arXiv:2110.07560 (2021).
- Vqa: Visual question answering. In Proceedings of the IEEE international conference on computer vision (2015), pp. 2425–2433.
- Adapting the linearised laplace model evidence for modern deep learning. In International Conference on Machine Learning (2022), PMLR, pp. 796–821.
- Sequential modeling enables scalable learning for large vision models. arXiv preprint arXiv:2312.00785 (2023).
- A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM journal on imaging sciences 2, 1 (2009), 183–202.
- Piqa: Reasoning about physical commonsense in natural language. In Thirty-Fourth AAAI Conference on Artificial Intelligence (2020).
- Video generation models as world simulators.
- Language models are few-shot learners. Advances in neural information processing systems 33 (2020), 1877–1901.
- Speedupnet: A plug-and-play hyper-network for accelerating text-to-image diffusion models. arXiv preprint arXiv:2312.08887 (2023).
- Int2. 1: Towards fine-tunable quantized large language models with error correction through low-rank adaptation. arXiv preprint arXiv:2306.08162 (2023).
- Nonlinear wavelet image processing: variational problems, compression, and noise removal through wavelet shrinkage. IEEE Transactions on image processing 7, 3 (1998), 319–335.
- A survey of web information extraction systems. IEEE transactions on knowledge and data engineering 18, 10 (2006), 1411–1428.
- Federated learning of large language models with parameter-efficient prompt tuning and adaptive optimization. arXiv preprint arXiv:2310.15080 (2023).
- Parameter-efficient fine-tuning design spaces. arXiv preprint arXiv:2301.01821 (2023).
- MMDetection: Open mmlab detection toolbox and benchmark. arXiv preprint arXiv:1906.07155 (2019).
- Ptp: Boosting stability and performance of prompt tuning with perturbation-based regularizer. arXiv preprint arXiv:2305.02423 (2023).
- Punica: Multi-tenant lora serving. arXiv preprint arXiv:2310.18547 (2023).
- Autoformer: Searching transformers for visual recognition. In Proceedings of the IEEE/CVF international conference on computer vision (2021), pp. 12270–12280.
- Adaptformer: Adapting vision transformers for scalable visual recognition. Advances in Neural Information Processing Systems 35 (2022), 16664–16678.
- Extending context window of large language models via positional interpolation. arXiv preprint arXiv:2306.15595 (2023).
- An empirical study of training self-supervised vision transformers. In Proceedings of the IEEE/CVF international conference on computer vision (2021), pp. 9640–9649.
- Longlora: Efficient fine-tuning of long-context large language models. arXiv preprint arXiv:2309.12307 (2023).
- Vision transformer adapter for dense predictions. arXiv preprint arXiv:2205.08534 (2022).
- Unifying vision-and-language tasks via text generation. In International Conference on Machine Learning (2021), PMLR, pp. 1931–1942.
- Smop: Towards efficient and effective prompt tuning with sparse mixture-of-prompts. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing (2023), pp. 14306–14316.
- Codeprompt: Task-agnostic prefix tuning for program and language generation. In Findings of the Association for Computational Linguistics: ACL 2023 (2023), pp. 5282–5297.
- Adaptersoup: Weight averaging to improve generalization of pretrained language models. arXiv preprint arXiv:2302.07027 (2023).
- Clark, C. e. a. Boolq: Exploring the surprising difficulty of natural yes/no questions. In NAACL (2019).
- Think you have solved question answering? try arc, the ai2 reasoning challenge. arXiv:1803.05457v1 (2018).
- Diffusion models in vision: A survey. IEEE Transactions on Pattern Analysis and Machine Intelligence (2023).
- Lifelong learning for question answering with hierarchical prompts. arXiv preprint arXiv:2208.14602 (2022).
- Unified low-resource sequence labeling by sample-aware dynamic sparse finetuning. arXiv preprint arXiv:2311.03748 (2023).
- Scaling vision transformers to 22 billion parameters. In International Conference on Machine Learning (2023), PMLR, pp. 7480–7512.
- Gpt3. int8 (): 8-bit matrix multiplication for transformers at scale. Advances in Neural Information Processing Systems 35 (2022), 30318–30332.
- Qlora: Efficient finetuning of quantized llms. arXiv preprint arXiv:2305.14314 (2023).
- Diffusion models beat gans on image synthesis. Advances in neural information processing systems 34 (2021), 8780–8794.
- Sparse low-rank adaptation of pre-trained language models. arXiv preprint arXiv:2311.11696 (2023).
- An image is worth 16x16 words: Transformers for image recognition at scale. arxiv 2020. arXiv preprint arXiv:2010.11929 (2010).
- Teaching structured vision & language concepts to vision & language models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023), pp. 2657–2668.
- Krona: Parameter efficient tuning with kronecker adapter. arXiv preprint arXiv:2212.10650 (2022).
- The pascal visual object classes (voc) challenge. International journal of computer vision 88 (2010), 303–338.
- Diverse data augmentation with diffusions for effective test-time prompt tuning. In Proceedings of the IEEE/CVF International Conference on Computer Vision (2023), pp. 2704–2714.
- Mixture-of-loras: An efficient multitask tuning for large language models. arXiv preprint arXiv:2403.03432 (2024).
- The lottery ticket hypothesis: Finding sparse, trainable neural networks. arXiv preprint arXiv:1803.03635 (2018).
- Frazier, P. I. A tutorial on bayesian optimization. arXiv preprint arXiv:1807.02811 (2018).
- On the effectiveness of parameter-efficient fine-tuning. In Proceedings of the AAAI Conference on Artificial Intelligence (2023), vol. 37, pp. 12799–12807.
- An image is worth one word: Personalizing text-to-image generation using textual inversion. arXiv preprint arXiv:2208.01618 (2022).
- Concept sliders: Lora adaptors for precise control in diffusion models. arXiv preprint arXiv:2311.12092 (2023).
- Clip-adapter: Better vision-language models with feature adapters. International Journal of Computer Vision (2023), 1–15.
- Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:2304.15010 (2023).
- A unified continual learning framework with general parameter-efficient tuning. arXiv preprint arXiv:2303.10070 (2023).
- Cross-attention is all you need: Adapting pretrained transformers for machine translation. arXiv preprint arXiv:2104.08771 (2021).
- The reversible residual network: Backpropagation without storing activations. Advances in neural information processing systems 30 (2017).
- The” something something” video database for learning and evaluating visual common sense. In Proceedings of the IEEE international conference on computer vision (2017), pp. 5842–5850.
- Unbounded cache model for online language modeling with open vocabulary. Advances in neural information processing systems 30 (2017).
- Mamba: Linear-time sequence modeling with selective state spaces. arXiv preprint arXiv:2312.00752 (2023).
- Parameter-efficient transfer learning with diff pruning. arXiv preprint arXiv:2012.07463 (2020).
- Lq-lora: Low-rank plus quantized matrix decomposition for efficient language model finetuning. arXiv preprint arXiv:2311.12023 (2023).
- A survey on large language models: Applications, challenges, limitations, and practical usage. TechRxiv (2023).
- Contrastive diffusion model with auxiliary guidance for coarse-to-fine pet reconstruction. In International Conference on Medical Image Computing and Computer-Assisted Intervention (2023), Springer, pp. 239–249.
- Zero-shot referring expression comprehension via structural similarity between images and captions. arXiv preprint arXiv:2311.17048 (2023).
- Increasing model capacity for free: A simple strategy for parameter efficient fine-tuning. In The Twelfth International Conference on Learning Representations (2023).
- Lora+: Efficient low rank adaptation of large models. arXiv preprint arXiv:2402.12354 (2024).
- Sensitivity-aware visual parameter-efficient fine-tuning. In Proceedings of the IEEE/CVF International Conference on Computer Vision (2023), pp. 11825–11835.
- Towards a unified view of parameter-efficient transfer learning. arXiv preprint arXiv:2110.04366 (2021).
- Masked autoencoders are scalable vision learners. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (2022), pp. 16000–16009.
- SparseAdapter: An easy approach for improving the parameter-efficiency of adapters. In Findings of the Association for Computational Linguistics: EMNLP 2022 (Abu Dhabi, United Arab Emirates, Dec. 2022), Association for Computational Linguistics, pp. 2184–2190.
- Mera: Merging pretrained adapters for few-shot learning. arXiv preprint arXiv:2308.15982 (2023).
- Parameter-efficient model adaptation for vision transformers. In Proceedings of the AAAI Conference on Artificial Intelligence (2023), vol. 37, pp. 817–825.
- Structured pruning adapters. arXiv preprint arXiv:2211.10155 (2022).
- Denoising diffusion probabilistic models. Advances in neural information processing systems 33 (2020), 6840–6851.
- A comprehensive survey of deep learning for image captioning. ACM Computing Surveys (CsUR) 51, 6 (2019), 1–36.
- Parameter-efficient transfer learning for nlp. In International Conference on Machine Learning (2019), PMLR, pp. 2790–2799.
- Lora: Low-rank adaptation of large language models. arXiv preprint arXiv:2106.09685 (2021).
- Sparse structure search for parameter-efficient tuning. arXiv preprint arXiv:2206.07382 (2022).
- Llm-adapters: An adapter family for parameter-efficient fine-tuning of large language models. arXiv preprint arXiv:2304.01933 (2023).
- Vl-pet: Vision-and-language parameter-efficient tuning via granularity control. In Proceedings of the IEEE/CVF International Conference on Computer Vision (2023), pp. 3010–3020.
- Lorahub: Efficient cross-task generalization via dynamic lora composition. arXiv preprint arXiv:2307.13269 (2023).
- Clip2point: Transfer clip to point cloud classification with image-depth pre-training. In Proceedings of the IEEE/CVF International Conference on Computer Vision (2023), pp. 22157–22167.
- Mvp-tuning: Multi-view knowledge retrieval with prompt tuning for commonsense reasoning. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) (2023), pp. 13417–13432.
- Scaling up visual and vision-language representation learning with noisy text supervision. In International conference on machine learning (2021), PMLR, pp. 4904–4916.
- Visual prompt tuning. In European Conference on Computer Vision (2022), Springer, pp. 709–727.
- Res-tuning: A flexible and efficient tuning paradigm via unbinding tuner from backbone. arXiv preprint arXiv:2310.19859 (2023).
- Convolutional bypasses are better vision transformer adapters. arXiv preprint arXiv:2207.07039 (2022).
- Revisiting the parameter efficiency of adapters from the perspective of precision redundancy. In Proceedings of the IEEE/CVF International Conference on Computer Vision (2023), pp. 17217–17226.
- Parameter-efficient tuning for large language model without calculating its gradients. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing (2023), pp. 321–330.
- Prompting visual-language models for efficient video understanding. In European Conference on Computer Vision (2022), Springer, pp. 105–124.
- Gear: An efficient kv cache compression recipefor near-lossless generative inference of llm. arXiv preprint arXiv:2403.05527 (2024).
- Compacter: Efficient low-rank hypercomplex adapter layers. Advances in Neural Information Processing Systems 34 (2021), 1022–1035.
- The kinetics human action video dataset. arXiv preprint arXiv:1705.06950 (2017).
- Maple: Multi-modal prompt learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023), pp. 19113–19122.
- Memory-efficient fine-tuning of compressed large language models via sub-4-bit integer quantization. arXiv preprint arXiv:2305.14152 (2023).
- Overcoming catastrophic forgetting in neural networks. Proceedings of the national academy of sciences 114, 13 (2017), 3521–3526.
- Hmdb: a large video database for human motion recognition. In 2011 International conference on computer vision (2011), IEEE, pp. 2556–2563.
- Multi-concept customization of text-to-image diffusion. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023), pp. 1931–1941.
- Efficient memory management for large language model serving with pagedattention. In Proceedings of the 29th Symposium on Operating Systems Principles (2023), pp. 611–626.
- Neural architecture search for parameter-efficient fine-tuning of large pre-trained language models. arXiv preprint arXiv:2305.16597 (2023).
- Lee, S. Toward continual learning for conversational agents. arXiv preprint arXiv:1712.09943 (2017).
- Conditional adapters: Parameter-efficient transfer learning with fast inference. arXiv preprint arXiv:2304.04947 (2023).
- The power of scale for parameter-efficient prompt tuning. arXiv preprint arXiv:2104.08691 (2021).
- Camel: Communicative agents for ”mind” exploration of large language model society. In Thirty-seventh Conference on Neural Information Processing Systems (2023).
- Prefix propagation: Parameter-efficient tuning for long sequences. arXiv preprint arXiv:2305.12086 (2023).
- Prefix-tuning: Optimizing continuous prompts for generation. arXiv preprint arXiv:2101.00190 (2021).
- Prompt tuning pushes farther, contrastive learning pulls closer: A two-stage approach to mitigate social biases. arXiv preprint arXiv:2307.01595 (2023).
- Supervision exists everywhere: A data efficient contrastive language-image pre-training paradigm. arXiv preprint arXiv:2110.05208 (2021).
- Loftq: Lora-fine-tuning-aware quantization for large language models. arXiv preprint arXiv:2310.08659 (2023).
- Scaling & shifting your features: A new baseline for efficient model tuning. Advances in Neural Information Processing Systems 35 (2022), 109–123.
- Prompts can play lottery tickets well: Achieving lifelong information extraction via lottery prompt tuning. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) (2023), pp. 277–292.
- Parameter-efficient fine-tuning without introducing new latency. arXiv preprint arXiv:2305.16742 (2023).
- Make your pre-trained model reversible: From parameter to memory efficient fine-tuning. arXiv preprint arXiv:2306.00477 (2023).
- Microsoft coco: Common objects in context. In Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part V 13 (2014), Springer, pp. 740–755.
- Frozen clip models are efficient video learners. In European Conference on Computer Vision (2022), Springer, pp. 388–404.
- Visual instruction tuning. arXiv preprint arXiv:2304.08485 (2023).
- Few-shot parameter-efficient fine-tuning is better and cheaper than in-context learning. Advances in Neural Information Processing Systems 35 (2022), 1950–1965.
- Versatile black-box optimization. In Proceedings of the 2020 Genetic and Evolutionary Computation Conference (2020), pp. 620–628.
- Bitdelta: Your fine-tune may only be worth one bit. arXiv preprint arXiv:2402.10193 (2024).
- Moelora: An moe-based parameter efficient fine-tuning method for multi-task medical applications. arXiv preprint arXiv:2310.18339 (2023).
- Dora: Weight-decomposed low-rank adaptation. arXiv preprint arXiv:2402.09353 (2024).
- P-tuning v2: Prompt tuning can be comparable to fine-tuning universally across scales and tasks. arXiv preprint arXiv:2110.07602 (2021).
- Late prompt tuning: A late prompt could be better than many prompts. arXiv preprint arXiv:2210.11292 (2022).
- Gpt understands, too. arXiv preprint arXiv:2103.10385 (2021).
- Inference-time policy adapters (ipa): Tailoring extreme-scale lms without fine-tuning. arXiv preprint arXiv:2305.15065 (2023).
- Lcm-lora: A universal stable-diffusion acceleration module. arXiv preprint arXiv:2311.05556 (2023).
- Xprompt: Exploring the extreme of prompt tuning. arXiv preprint arXiv:2210.04457 (2022).
- MacKay, D. J. A practical bayesian framework for backpropagation networks. Neural computation 4, 3 (1992), 448–472.
- Continual learning in task-oriented dialogue systems. arXiv preprint arXiv:2012.15504 (2020).
- Parameter-efficient multi-task fine-tuning for transformers via shared hypernetworks. arXiv preprint arXiv:2106.04489 (2021).
- Proving the lottery ticket hypothesis: Pruning is all you need. In International Conference on Machine Learning (2020), PMLR, pp. 6682–6691.
- Fine-tuning language models with just forward passes. arXiv preprint arXiv:2305.17333 (2023).
- Peft: State-of-the-art parameter-efficient fine-tuning methods. https://github.com/huggingface/peft, 2022.
- Unipelt: A unified framework for parameter-efficient language model tuning. arXiv preprint arXiv:2110.07577 (2021).
- Periodiclora: Breaking the low-rank bottleneck in lora optimization. arXiv preprint arXiv:2402.16141 (2024).
- Can a suit of armor conduct electricity? a new dataset for open book question answering. In EMNLP (2018).
- T2i-adapter: Learning adapters to dig out more controllable ability for text-to-image diffusion models. arXiv preprint arXiv:2302.08453 (2023).
- Zero-shot temporal action detection via vision-language prompting. In European Conference on Computer Vision (2022), Springer, pp. 681–697.
- Expanding language-image pretrained models for general video recognition. In European Conference on Computer Vision (2022), Springer, pp. 1–18.
- OpenAI. Gpt-4. In https://openai.com/gpt-4 (2023).
- Orhan, E. A simple cache model for image recognition. Advances in Neural Information Processing Systems 31 (2018).
- On prefix-tuning for lightweight out-of-distribution detection. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) (2023), pp. 1533–1545.
- Controlling the extraction of memorized data from large language models via prompt-tuning. arXiv preprint arXiv:2305.11759 (2023).
- St-adapter: Parameter-efficient image-to-video transfer learning. Advances in Neural Information Processing Systems 35 (2022), 26462–26477.
- When do prompting and prefix-tuning work? a theory of capabilities and limitations. arXiv preprint arXiv:2310.19698 (2023).
- Adapterfusion: Non-destructive task composition for transfer learning. arXiv preprint arXiv:2005.00247 (2020).
- Hypertuning: Toward adapting large language models without back-propagation. In International Conference on Machine Learning (2023), PMLR, pp. 27854–27875.
- Adapters: A unified library for parameter-efficient and modular transfer learning, 2023.
- Exploring universal intrinsic task subspace via prompt tuning. arXiv preprint arXiv:2110.07867 (2021).
- Learning transferable visual models from natural language supervision. In International conference on machine learning (2021), PMLR, pp. 8748–8763.
- Qdylora: Quantized dynamic low-rank adaptation for efficient large language model tuning. arXiv preprint arXiv:2402.10462 (2024).
- Learning semantic proxies from visual prompts for parameter-efficient fine-tuning in deep metric learning. arXiv preprint arXiv:2402.02340 (2024).
- Self-critical sequence training for image captioning. In Proceedings of the IEEE conference on computer vision and pattern recognition (2017), pp. 7008–7024.
- High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (2022), pp. 10684–10695.
- Adapterdrop: On the efficiency of adapters in transformers. arXiv preprint arXiv:2010.11918 (2020).
- Dreambooth: Fine tuning text-to-image diffusion models for subject-driven generation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023), pp. 22500–22510.
- Winogrande: An adversarial winograd schema challenge at scale. Communications of the ACM 64, 9 (2021), 99–106.
- Socialiqa: Commonsense reasoning about social interactions. arXiv preprint arXiv:1904.09728 (2019).
- S-lora: Serving thousands of concurrent lora adapters. arXiv preprint arXiv:2311.03285 (2023).
- Flexgen: High-throughput generative inference of large language models with a single gpu. In International Conference on Machine Learning (2023), PMLR, pp. 31094–31116.
- Dept: Decomposed prompt tuning for parameter-efficient fine-tuning. arXiv preprint arXiv:2309.05173 (2023).
- Test-time prompt tuning for zero-shot generalization in vision-language models. Advances in Neural Information Processing Systems 35 (2022), 14274–14289.
- Flava: A foundational language and vision alignment model. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2022), pp. 15638–15650.
- Deep unsupervised learning using nonequilibrium thermodynamics. In International conference on machine learning (2015), PMLR, pp. 2256–2265.
- How to train your vit? data, augmentation, and regularization in vision transformers. arXiv preprint arXiv:2106.10270 (2021).
- Roformer: Enhanced transformer with rotary position embedding. arXiv preprint arXiv:2104.09864 (2021).
- On transferability of prompt tuning for natural language processing. arXiv preprint arXiv:2111.06719 (2021).
- Lst: Ladder side-tuning for parameter and memory efficient transfer learning. Advances in Neural Information Processing Systems 35 (2022), 12991–13005.
- Vl-adapter: Parameter-efficient transfer learning for vision-and-language tasks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2022), pp. 5227–5237.
- Training neural networks with fixed sparse masks. Advances in Neural Information Processing Systems 34 (2021), 24193–24205.
- Gemini: a family of highly capable multimodal models. arXiv preprint arXiv:2312.11805 (2023).
- Dylora: Parameter efficient tuning of pre-trained models using dynamic search-free low-rank adaptation. arXiv preprint arXiv:2210.07558 (2022).
- Show and tell: Lessons learned from the 2015 mscoco image captioning challenge. IEEE transactions on pattern analysis and machine intelligence 39, 4 (2016), 652–663.
- Spot: Better frozen model adaptation through soft prompt transfer. arXiv preprint arXiv:2110.07904 (2021).
- Efficient fine-tuning of bert models on the edge. In 2022 IEEE International Symposium on Circuits and Systems (ISCAS) (2022), IEEE, pp. 1838–1842.
- Glue: A multi-task benchmark and analysis platform for natural language understanding. arXiv preprint arXiv:1804.07461 (2018).
- Fvqa: Fact-based visual question answering. IEEE transactions on pattern analysis and machine intelligence 40, 10 (2017), 2413–2427.
- Aprompt: Attention prompt tuning for efficient adaptation of pre-trained language models. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing (2023), pp. 9147–9160.
- Orthogonal subspace learning for language model continual learning. arXiv preprint arXiv:2310.14152 (2023).
- Universality and limitations of prompt tuning. arXiv preprint arXiv:2305.18787 (2023).
- Adamix: Mixture-of-adapter for parameter-efficient tuning of large language models. arXiv preprint arXiv:2205.12410 1, 2 (2022), 4.
- P2p: Tuning pre-trained image models for point cloud analysis with point-to-pixel prompting. Advances in neural information processing systems 35 (2022), 14388–14402.
- Learning to prompt for continual learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2022), pp. 139–149.
- Robust principal component analysis: Exact recovery of corrupted low-rank matrices via convex optimization. Advances in neural information processing systems 22 (2009).
- Adversarial soft prompt tuning for cross-domain sentiment analysis. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) (2022), pp. 2438–2447.
- Infoprompt: Information-theoretic soft prompt tuning for natural language understanding. arXiv preprint arXiv:2306.04933 (2023).
- Tune-a-video: One-shot tuning of image diffusion models for text-to-video generation. In Proceedings of the IEEE/CVF International Conference on Computer Vision (2023), pp. 7623–7633.
- Autogen: Enabling next-gen llm applications via multi-agent conversation framework. arXiv preprint arXiv:2308.08155 (2023).
- Visual question answering: A survey of methods and datasets. Computer Vision and Image Understanding 163 (2017), 21–40.
- Idpg: An instance-dependent prompt generation method. arXiv preprint arXiv:2204.04497 (2022).
- Offsite-tuning: Transfer learning without full model. arXiv preprint arXiv:2302.04870 (2023).
- Simda: Simple diffusion adapter for efficient video generation. arXiv preprint arXiv:2308.09710 (2023).
- Gentopia: A collaborative platform for tool-augmented llms. arXiv preprint arXiv:2308.04030 (2023).
- Side adapter network for open-vocabulary semantic segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023), pp. 2945–2954.
- Raise a child in large language model: Towards effective and generalizable fine-tuning. arXiv preprint arXiv:2109.05687 (2021).
- Qa-lora: Quantization-aware low-rank adaptation of large language models. arXiv preprint arXiv:2309.14717 (2023).
- Bridging vision and language encoders: Parameter-efficient tuning for referring image segmentation. In Proceedings of the IEEE/CVF International Conference on Computer Vision (2023), pp. 17503–17512.
- Bayesian low-rank adaptation for large language models. arXiv preprint arXiv:2308.13111 (2023).
- Yang, J. Longqlora: Efficient and effective method to extend context length of large language models. arXiv preprint arXiv:2311.04879 (2023).
- Diffusion models: A comprehensive survey of methods and applications. ACM Computing Surveys 56, 4 (2023), 1–39.
- Aim: Adapting image models for efficient video action recognition. arXiv preprint arXiv:2302.03024 (2023).
- End-to-end open-domain question answering with bertserini. arXiv preprint arXiv:1902.01718 (2019).
- Ip-adapter: Text compatible image prompt adapter for text-to-image diffusion models. arXiv preprint arXiv:2308.06721 (2023).
- Improving visual prompt tuning for self-supervised vision transformers. arXiv preprint arXiv:2306.05067 (2023).
- Image captioning with semantic attention. In Proceedings of the IEEE conference on computer vision and pattern recognition (2016), pp. 4651–4659.
- Convolutions die hard: Open-vocabulary segmentation with single frozen convolutional clip. arXiv preprint arXiv:2308.02487 (2023).
- Pushing mixture of experts to the limit: Extremely parameter efficient moe for instruction tuning. arXiv preprint arXiv:2309.05444 (2023).
- Bitfit: Simple parameter-efficient fine-tuning for transformer-based masked language-models. arXiv preprint arXiv:2106.10199 (2021).
- Hellaswag: Can a machine really finish your sentence? In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics (2019).
- Ipdreamer: Appearance-controllable 3d object generation with image prompts. arXiv preprint arXiv:2310.05375 (2023).
- One network, many masks: Towards more parameter-efficient transfer learning. arXiv preprint arXiv:2305.17682 (2023).
- Root mean square layer normalization. Advances in Neural Information Processing Systems 32 (2019).
- Summit: Iterative text summarization via chatgpt. arXiv preprint arXiv:2305.14835 (2023).
- Side-tuning: a baseline for network adaptation via additive side networks. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part III 16 (2020), Springer, pp. 698–714.
- Adding conditional control to text-to-image diffusion models. In Proceedings of the IEEE/CVF International Conference on Computer Vision (2023), pp. 3836–3847.
- Lora-fa: Memory-efficient low-rank adaptation for large language models fine-tuning. arXiv preprint arXiv:2308.03303 (2023).
- Pruning meets low-rank parameter-efficient fine-tuning. arXiv preprint arXiv:2305.18403 (2023).
- Adaptive budget allocation for parameter-efficient fine-tuning. arXiv preprint arXiv:2303.10512 (2023).
- Tip-adapter: Training-free clip-adapter for better vision-language modeling. arXiv preprint arXiv:2111.03930 (2021).
- Pointclip: Point cloud understanding by clip. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2022), pp. 8552–8562.
- Llama-adapter: Efficient fine-tuning of language models with zero-init attention. arXiv preprint arXiv:2303.16199 (2023).
- Autolora: Automatically tuning matrix ranks in low-rank adaptation based on meta learning. arXiv preprint arXiv:2403.09113 (2024).
- Neural prompt search, 2022.
- H2o: Heavy-hitter oracle for efficient generative inference of large language models. Advances in Neural Information Processing Systems 36 (2024).
- Towards adaptive prefix tuning for parameter-efficient language model fine-tuning. arXiv preprint arXiv:2305.15212 (2023).
- Tuning layernorm in attention: Towards efficient multi-modal llm finetuning. arXiv preprint arXiv:2312.11420 (2023).
- Prototype-based hyperadapter for sample-efficient multi-task tuning. arXiv preprint arXiv:2310.11670 (2023).
- Infusing hierarchical guidance into prompt tuning: A parameter-efficient framework for multi-level implicit discourse relation recognition. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) (2023), pp. 6477–6492.
- Galore: Memory-efficient llm training by gradient low-rank projection. arXiv preprint arXiv:2403.03507 (2024).
- Knowledgeable parameter efficient tuning network for commonsense question answering. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) (2023), pp. 9051–9063.
- Scene parsing through ade20k dataset. In Proceedings of the IEEE conference on computer vision and pattern recognition (2017), pp. 633–641.
- Autopeft: Automatic configuration search for parameter-efficient fine-tuning. arXiv preprint arXiv:2301.12132 (2023).
- Conditional prompt learning for vision-language models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2022), pp. 16816–16825.
- Learning to prompt for vision-language models. International Journal of Computer Vision 130, 9 (2022), 2337–2348.
- Godec: Randomized low-rank & sparse matrix decomposition in noisy case. In Proceedings of the 28th International Conference on Machine Learning, ICML 2011 (2011).
- {{\{{PetS}}\}}: A unified framework for {{\{{Parameter-Efficient}}\}} transformers serving. In 2022 USENIX Annual Technical Conference (USENIX ATC 22) (2022), pp. 489–504.
- Prompt-aligned gradient for prompt tuning. In Proceedings of the IEEE/CVF International Conference on Computer Vision (2023), pp. 15659–15669.
- Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:2304.10592 (2023).
- Continual prompt tuning for dialog state tracking. arXiv preprint arXiv:2203.06654 (2022).
- Multilingual machine translation with large language models: Empirical results and analysis. arXiv preprint arXiv:2304.04675 (2023).
- Spt: Learning to selectively insert prompts for better prompt tuning. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing (2023), pp. 11862–11878.
- Pointclip v2: Prompting clip and gpt for powerful 3d open-world learning. In Proceedings of the IEEE/CVF International Conference on Computer Vision (2023), pp. 2639–2650.
- Counter-interference adapter for multilingual machine translation. arXiv preprint arXiv:2104.08154 (2021).
- Toolqa: A dataset for llm question answering with external tools. arXiv preprint arXiv:2306.13304 (2023).