Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
169 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Bridging the Gap Between Foundation Models and Heterogeneous Federated Learning (2310.00247v2)

Published 30 Sep 2023 in cs.LG and cs.DC

Abstract: Federated learning (FL) offers privacy-preserving decentralized machine learning, optimizing models at edge clients without sharing private data. Simultaneously, foundation models (FMs) have gained traction in the AI community due to their exceptional performance across various tasks. However, integrating FMs into FL presents challenges, primarily due to their substantial size and intensive resource requirements. This is especially true when considering the resource heterogeneity in edge FL systems. We present an adaptive framework for Resource-aware Federated Foundation Models (RaFFM) to address these challenges. RaFFM introduces specialized model compression algorithms tailored for FL scenarios, such as salient parameter prioritization and high-performance subnetwork extraction. These algorithms enable dynamic scaling of given transformer-based FMs to fit heterogeneous resource constraints at the network edge during both FL's optimization and deployment stages. Experimental results demonstrate that RaFFM shows significant superiority in resource utilization efficiency and uses fewer resources to deploy FMs to FL. Despite the lower resource consumption, target models optimized by RaFFM achieve performance on par with traditional FL methods applied to full-sized FMs. This is evident across tasks in both natural language processing and computer vision domains.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (43)
  1. What is the state of neural network pruning? In Inderjit S. Dhillon, Dimitris S. Papailiopoulos, and Vivienne Sze (eds.), MLSys. mlsys.org, 2020. URL http://dblp.uni-trier.de/db/conf/mlsys/mlsys2020.html#BlalockOFG20.
  2. On the opportunities and risks of foundation models. ArXiv, 2021. URL https://crfm.stanford.edu/assets/report.pdf.
  3. Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901, 2020a.
  4. Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901, 2020b.
  5. Scaling instruction-finetuned language models. arXiv preprint arXiv:2210.11416, 2022.
  6. Adaptive personalized federated learning. arXiv preprint arXiv:2003.13461, 2020.
  7. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805, 2018.
  8. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929, 2020.
  9. Amc: Automl for model compression and acceleration on mobile devices. In Proceedings of the European conference on computer vision (ECCV), pp.  784–800, 2018.
  10. Lora: Low-rank adaptation of large language models. arXiv preprint arXiv:2106.09685, 2021.
  11. Llm-adapters: An adapter family for parameter-efficient fine-tuning of large language models. arXiv preprint arXiv:2304.01933, 2023.
  12. Model pruning enables efficient federated learning on edge devices. IEEE Transactions on Neural Networks and Learning Systems, 2022.
  13. Scaffold: Stochastic controlled averaging for federated learning. In International Conference on Machine Learning, pp.  5132–5143. PMLR, 2020.
  14. A Krizhevsky and G Hinton. Learning multiple layers of features from tiny images. Master’s thesis, Department of Computer Science, University of Toronto, 2009.
  15. Pruning filters with l1-norm and capped l1-norm for cnn compression. Applied Intelligence, 51:1152–1160, 2021.
  16. Pruning filters for efficient convnets. In International Conference on Learning Representations, 2016.
  17. Federated optimization in heterogeneous networks. Proceedings of Machine learning and systems, 2:429–450, 2020.
  18. Ensemble distillation for robust model fusion in federated learning. Advances in Neural Information Processing Systems, 33:2351–2363, 2020.
  19. Accelerating federated learning via momentum gradient descent. IEEE Transactions on Parallel and Distributed Systems, 31(8):1754–1766, 2020.
  20. Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692, 2019.
  21. Communication-efficient learning of deep networks from decentralized data. In Artificial intelligence and statistics, pp.  1273–1282. PMLR, 2017.
  22. Automated super-network generation for scalable neural architecture search. In First Conference on Automated Machine Learning (Main Track), 2022. URL https://openreview.net/forum?id=HK-zmbTB8gq.
  23. Enabling nas with automated super-network generation. In Practical Deep Learning in the Wild, AAAI, 2022. URL https://arxiv.org/abs/2112.10878.
  24. Automated flower classification over a large number of classes. In Indian Conference on Computer Vision, Graphics and Image Processing, Dec 2008.
  25. OpenAI. Gpt-4 technical report, 2023.
  26. Are NLP models really able to solve simple math word problems? In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp.  2080–2094, Online, June 2021. Association for Computational Linguistics. doi: 10.18653/v1/2021.naacl-main.168. URL https://aclanthology.org/2021.naacl-main.168.
  27. Language models are unsupervised multitask learners. OpenAI blog, 1(8):9, 2019.
  28. Learning transferable visual models from natural language supervision. In International conference on machine learning, pp.  8748–8763. PMLR, 2021.
  29. SQuAD: 100,000+ Questions for Machine Comprehension of Text. arXiv e-prints, art. arXiv:1606.05250, 2016.
  30. Distilbert, a distilled version of bert: smaller, faster, cheaper and lighter. arXiv preprint arXiv:1910.01108, 2019.
  31. Towards personalized federated learning. IEEE Transactions on Neural Networks and Learning Systems, 2022.
  32. Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288, 2023.
  33. Attention is all you need. Advances in neural information processing systems, 30, 2017.
  34. Glue: A multi-task benchmark and analysis platform for natural language understanding. arXiv preprint arXiv:1804.07461, 2018.
  35. Neural architecture search: Insights from 1000 papers, 2023. URL https://arxiv.org/abs/2301.08727.
  36. Auto graph encoder-decoder for neural network pruning. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pp.  6362–6372, 2021a.
  37. Adaptive dynamic pruning for non-iid federated learning. arXiv preprint arXiv:2106.06921, 2021b.
  38. Topology-aware network pruning using multi-stage graph embedding and reinforcement learning. In International Conference on Machine Learning, pp.  25656–25667. PMLR, 2022a.
  39. Spatl: salient parameter aggregation and transfer learning for heterogeneous federated learning. In 2022 SC22: International Conference for High Performance Computing, Networking, Storage and Analysis (SC), pp.  495–508. IEEE Computer Society, 2022b.
  40. Resource-aware heterogeneous federated learning using neural architecture search. arXiv preprint arXiv:2211.05716, 2022c.
  41. Resource-aware federated learning using knowledge extraction and multi-model fusion. arXiv preprint arXiv:2208.07978, 2022d.
  42. Federated foundation models: Privacy-preserving and collaborative learning for large models. arXiv preprint arXiv:2305.11414, 2023.
  43. When foundation model meets federated learning: Motivations, challenges, and future directions. arXiv preprint arXiv:2306.15546, 2023.
Citations (7)

Summary

We haven't generated a summary for this paper yet.