Federated Full-Parameter Tuning of Billion-Sized Language Models with Communication Cost under 18 Kilobytes (2312.06353v5)
Abstract: Pre-trained LLMs need fine-tuning to improve their responsiveness to natural language instructions. Federated learning offers a way to fine-tune LLMs using the abundant data on end devices without compromising data privacy. Most existing federated fine-tuning methods for LLMs rely on parameter-efficient fine-tuning techniques, which may not reach the performance height possible with full-parameter tuning. However, federated full-parameter tuning of LLMs is a non-trivial problem due to the immense communication cost. This work introduces FedKSeed that employs zeroth-order optimization with a finite set of random seeds. It significantly reduces transmission requirements between the server and clients to just a few random seeds and scalar gradients, amounting to only a few thousand bytes, making federated full-parameter tuning of billion-sized LLMs possible on devices. Building on it, we develop a strategy enabling probability-differentiated seed sampling, prioritizing perturbations with greater impact on model accuracy. Experiments across six scenarios with various LLMs, datasets and data partitions demonstrate that our approach outperforms existing federated LLM fine-tuning methods in both communication efficiency and new task generalization.
- Slora: Federated parameter efficient fine-tuning of language models. arXiv preprint arXiv:2308.06522, 2023.
- Federated learning of large language models with parameter-efficient prompt tuning and adaptive optimization. arXiv preprint arXiv:2310.15080, 2023.
- Federated large language model: A position paper. arXiv preprint arXiv:2307.08925, 2023a.
- Fs-real: Towards real-world cross-device federated learning. In Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pp. 3829–3841, 2023b.
- Data-juicer: A one-stop data processing system for large language models, 2023c.
- Efficient personalized federated learning via sparse model-adaptation. In International Conference on Machine Learning, ICML, volume 202, pp. 5234–5256, 2023d.
- Revisiting parameter-efficient tuning: Are we really there yet? In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pp. 2612–2626, 2022.
- Fine-grained theoretical analysis of federated zeroth-order optimization. In Thirty-seventh Conference on Neural Information Processing Systems, 2023e.
- Free dolly: Introducing the world’s first truly open instruction-tuned LLM, 2023. URL https://www.databricks.com/blog/2023/04/12/dolly-first-open-commercially-viable-instruction-tuned-llm.
- Qlora: Efficient finetuning of quantized LLMs. arXiv preprint arXiv:2305.14314, 2023.
- Towards next-generation intelligent assistants leveraging LLM techniques. In Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pp. 5792–5793, 2023.
- Docofl: downlink compression for cross-device federated learning. In International Conference on Machine Learning, pp. 8356–8388. PMLR, 2023.
- FATE-LLM: A industrial grade federated learning framework for large language models. CoRR, abs/2310.10049, 2023.
- Communication-efficient stochastic zeroth-order optimization for federated learning. IEEE Transactions on Signal Processing, 70:5058–5073, 2022.
- Does federated learning really need backpropagation? arXiv preprint arXiv:2301.12195, 2023.
- Lora: Low-rank adaptation of large language models. In International Conference on Learning Representations, ICLR, 2022.
- Low-parameter federated learning with large language models. arXiv preprint arXiv:2307.13896, 2023.
- Adam: A method for stochastic optimization. In International Conference on Learning Representations (ICLR), pp. 1–15, 2015.
- Federatedscope-LLM: A comprehensive package for fine-tuning large language models in federated learning. arXiv preprint arXiv:2309.00363, 2023.
- The power of scale for parameter-efficient prompt tuning. arXiv preprint arXiv:2104.08691, 2021.
- Communication-efficient decentralized zeroth-order method on heterogeneous data. In International Conference on Wireless Communications and Signal Processing, WCSP, pp. 1–6, 2021.
- Lin, C.-Y. Rouge: A package for automatic evaluation of summaries. In Text summarization branches out, pp. 74–81, 2004.
- Zeroth-order stochastic variance reduction for nonconvex optimization. Advances in Neural Information Processing Systems, 31, 2018.
- GPT understands, too. AI Open, 2023.
- Fine-tuning language models with just forward passes. CoRR, abs/2305.17333, 2023.
- PEFT: State-of-the-art parameter-efficient fine-tuning methods. https://github.com/huggingface/peft, 2022.
- FedZeN: Towards superlinear zeroth-order federated learning via incremental hessian estimation. arXiv preprint arXiv:2309.17174, 2023.
- Communication-efficient learning of deep networks from decentralized data. In Artificial intelligence and statistics, pp. 1273–1282. PMLR, 2017.
- Cross-task generalization via natural language crowdsourcing instructions. In ACL, 2022.
- Pytorch: An imperative style, high-performance deep learning library. Advances in neural information processing systems, 32, 2019.
- Empirical analysis of the strengths and weaknesses of PEFT techniques for LLMs. In ICLR 2023 Workshop on Mathematical and Empirical Understanding of Foundation Models, 2023.
- Blockdfl: A blockchain-based fully decentralized federated learning framework, 2023.
- EvoFed: Leveraging evolutionary strategies for communication-efficient federated learning. arXiv preprint arXiv:2311.07485, 2023.
- Federated zeroth-order optimization using trajectory-informed surrogate gradients. arXiv preprint arXiv:2308.04077, 2023.
- Recursive deep models for semantic compositionality over a sentiment treebank. In Proceedings of the 2013 conference on empirical methods in natural language processing, pp. 1631–1642, 2013.
- A comparative study between full-parameter and lora-based fine-tuning on chinese instruction data for instruction following large language model. arXiv preprint arXiv:2304.08109, 2023.
- Towards personalized federated learning. IEEE Transactions on Neural Networks and Learning Systems, 2022.
- Stanford alpaca: An instruction-following llama model. https://github.com/tatsu-lab/stanford_alpaca, 2023.
- Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971, 2023.
- Super-naturalinstructions:generalization via declarative instructions on 1600+ tasks. In EMNLP, 2022.
- Finetuned language models are zero-shot learners. In International Conference on Learning Representations, ICLR, 2022.
- Federated fine-tuning of LLMs on the very edge: The good, the bad, the ugly. arXiv preprint arXiv:2310.03150, 2023.
- Transformers: State-of-the-art natural language processing. In Proceedings of the 2020 conference on empirical methods in natural language processing: system demonstrations, pp. 38–45, 2020.
- Federated fine-tuning of billion-sized language models across mobile devices. arXiv preprint arXiv:2308.13894, 2023.
- Just one byte (per gradient): A note on low-bandwidth decentralized language model finetuning using shared randomness. arXiv preprint arXiv:2306.10015, 2023.
- Towards building the federated GPT: Federated instruction tuning. arXiv preprint arXiv:2305.05644, 2023a.
- FLIP: A provable defense framework for backdoor mitigation in federated learning. In The Eleventh International Conference on Learning Representations, ICLR, 2023b.
- Fedpetuning: When federated learning meets the parameter-efficient tuning methods of pre-trained language models. In Findings of the Association for Computational Linguistics: ACL, pp. 9963–9977, 2023c.