FLuID: Mitigating Stragglers in Federated Learning using Invariant Dropout (2307.02623v3)
Abstract: Federated Learning (FL) allows machine learning models to train locally on individual mobile devices, synchronizing model updates via a shared server. This approach safeguards user privacy; however, it also generates a heterogeneous training environment due to the varying performance capabilities across devices. As a result, straggler devices with lower performance often dictate the overall training time in FL. In this work, we aim to alleviate this performance bottleneck due to stragglers by dynamically balancing the training load across the system. We introduce Invariant Dropout, a method that extracts a sub-model based on the weight update threshold, thereby minimizing potential impacts on accuracy. Building on this dropout technique, we develop an adaptive training framework, Federated Learning using Invariant Dropout (FLuID). FLuID offers a lightweight sub-model extraction to regulate computational intensity, thereby reducing the load on straggler devices without affecting model quality. Our method leverages neuron updates from non-straggler devices to construct a tailored sub-model for each straggler based on client performance profiling. Furthermore, FLuID can dynamically adapt to changes in stragglers as runtime conditions shift. We evaluate FLuID using five real-world mobile clients. The evaluations show that Invariant Dropout maintains baseline model efficiency while alleviating the performance bottleneck of stragglers through a dynamic, runtime approach.
- Tensorflow: A system for large-scale machine learning. In Proceedings of the 12th USENIX Conference on Operating Systems Design and Implementation, OSDI’16, page 265–283, USA, 2016. USENIX Association.
- Gap-aware mitigation of gradient staleness. In International Conference on Learning Representations, 2020.
- Flower: A friendly federated learning research framework, 2020.
- Fedat: A high-performance and communication-efficient federated learning system with asynchronous tiers. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, SC ’21, New York, NY, USA, 2021. Association for Computing Machinery.
- Leaf: A benchmark for federated settings, 2018.
- Expanding the reach of federated learning by reducing client resource requirements, 2018.
- Asynchronous online federated learning for edge devices with non-iid data. In 2020 IEEE International Conference on Big Data (Big Data), pages 15–24, 2020.
- Communication-efficient federated learning. Proceedings of the National Academy of Sciences, 118(17):e2024789118, 2021.
- Heterofl: Computation and communication efficient federated learning for heterogeneous clients. In 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net, 2021.
- Federated learning challenges and opportunities: An outlook, 2022.
- Google. Tensorflow lite | ml for mobile and edge devices. https://www.tensorflow.org/lite. Accessed: 2022-08-14.
- Group knowledge transfer: Federated learning of large cnns at the edge. In H. Larochelle, M. Ranzato, R. Hadsell, M.F. Balcan, and H. Lin, editors, Advances in Neural Information Processing Systems, volume 33, pages 14068–14080. Curran Associates, Inc., 2020.
- Taming momentum in a distributed asynchronous environment, 2019.
- FjORD: Fair and accurate federated learning under heterogeneous targets with ordered dropout. In A. Beygelzimer, Y. Dauphin, P. Liang, and J. Wortman Vaughan, editors, Advances in Neural Information Processing Systems, 2021.
- Long short-term memory. Neural computation, 9(8):1735–1780, 1997.
- Adaptive gradient sparsification for efficient federated learning: An online learning approach. 2020 IEEE 40th International Conference on Distributed Computing Systems (ICDCS), pages 300–310, 2020.
- Model pruning enables efficient federated learning on edge devices. IEEE transactions on neural networks and learning systems, PP, 2022.
- Superfed: Weight shared federated learning, 2023.
- Advances and open problems in federated learning. CoRR, abs/1912.04977, 2019.
- Alex Krizhevsky. Learning multiple layers of features from tiny images. 2009.
- Federated learning: Challenges, methods, and future directions. IEEE Signal Processing Magazine, 37:50–60, 2020.
- Resource-adaptive federated learning with all-in-one neural composition. In Advances in Neural Information Processing Systems, 2022.
- Communication-efficient learning of deep networks from decentralized data. In AISTATS, 2017.
- Fedprune: Towards inclusive federated learning, 2021.
- Distreal: Distributed resource-aware learning in heterogeneous systems. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 36, pages 8062–8071, 2022.
- Mobile marketing: A synthesis and prognosis. Journal of interactive marketing, 23(2):118–129, 2009.
- Codedpaddedfl and codedsecagg: Straggler mitigation and secure aggregation in federated learning, 2021.
- Mobile marketing in the retailing environment: current insights and future research avenues. Journal of interactive marketing, 24(2):111–120, 2010.
- Very deep convolutional networks for large-scale image recognition. In Yoshua Bengio and Yann LeCun, editors, 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings, 2015.
- Fedfly: Towards migration in edge-based distributed federated learning, 2021.
- Fedlite: A scalable approach for federated learning on resource-constrained clients, 2022.
- Fedadapt: Adaptive offloading for iot devices in federated learning, 2021.
- Gradient sparsification for communication-efficient distributed optimization. In NeurIPS, 2018.
- Accelerating federated learning for iot in big data analytics with pruning, quantization and selective updating. IEEE Access, 9:38457–38466, 2021.
- Asynchronous federated optimization, 2019.
- Helios: Heterogeneity-aware federated learning with dynamically balanced collaboration. In 2021 58th ACM/IEEE Design Automation Conference (DAC), pages 997–1002, 2021.
- Flee: A hierarchical federated learning framework for distributed deep neural network over cloud, edge and end device. ACM Trans. Intell. Syst. Technol., jan 2022. Just Accepted.