LW-FedSSL: Resource-efficient Layer-wise Federated Self-supervised Learning (2401.11647v4)
Abstract: Many studies integrate federated learning (FL) with self-supervised learning (SSL) to take advantage of raw data distributed across edge devices. However, edge devices often struggle with high computational and communication costs imposed by SSL and FL algorithms. With the deployment of more complex and large-scale models, such as Transformers, these challenges are exacerbated. To tackle this, we propose the Layer-Wise Federated Self-Supervised Learning (LW-FedSSL) approach, which allows edge devices to incrementally train a small part of the model at a time. Specifically, in LW-FedSSL, training is decomposed into multiple stages, with each stage responsible for only a specific layer (or a block of layers) of the model. Since only a portion of the model is active for training at any given time, LW-FedSSL significantly reduces computational requirements. Additionally, only the active model portion needs to be exchanged between the FL server and clients, reducing the communication overhead. This enables LW-FedSSL to jointly address both computational and communication challenges in FL. Depending on the SSL algorithm used, it can achieve up to a $3.34 \times$ reduction in memory usage, $4.20 \times$ fewer computational operations (GFLOPs), and a $5.07 \times$ lower communication cost while maintaining performance comparable to its end-to-end training counterpart. Furthermore, we explore a progressive training strategy called Prog-FedSSL, which offers a $1.84\times$ reduction in GFLOPs and a $1.67\times$ reduction in communication costs while maintaining the same memory requirements as end-to-end training. While the resource efficiency of Prog-FedSSL is lower than that of LW-FedSSL, its performance improvements make it a viable candidate for FL environments with more lenient resource constraints.
- Learning representations by maximizing mutual information across views. Advances in neural information processing systems, 32, 2019.
- Greedy layer-wise training of deep networks. In Proceedings of the 19th International Conference on Neural Information Processing Systems, NIPS’06, page 153–160, Cambridge, MA, USA, 2006. MIT Press.
- Expanding the reach of federated learning by reducing client resource requirements. arXiv preprint arXiv:1812.07210, 2018.
- Exploring simple siamese representation learning. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 15750–15758, 2021.
- A simple framework for contrastive learning of visual representations. In International conference on machine learning, pages 1597–1607. PMLR, 2020.
- An empirical study of training self-supervised vision transformers. In 2021 IEEE/CVF International Conference on Computer Vision (ICCV), pages 9620–9629, Los Alamitos, CA, USA, oct 2021. IEEE Computer Society.
- An analysis of single-layer networks in unsupervised feature learning. In Proceedings of the fourteenth international conference on artificial intelligence and statistics, pages 215–223. JMLR Workshop and Conference Proceedings, 2011.
- An image is worth 16x16 words: Transformers for image recognition at scale. In International Conference on Learning Representations, 2021.
- Thomas S Ferguson. A bayesian analysis of some nonparametric problems. The annals of statistics, pages 209–230, 1973.
- Accurate, large minibatch sgd: Training imagenet in 1 hour. arXiv preprint arXiv:1706.02677, 2017.
- Towards federated learning under resource constraints via layer-wise training and depth dropout. In The 4th Workshop on practical ML for Developing Countries: learning under limited/low resource settings @ ICLR 2023, 2023.
- Momentum contrast for unsupervised visual representation learning. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 9729–9738, 2020.
- A fast learning algorithm for deep belief nets. Neural Comput., 18(7):1527–1554, jul 2006.
- Fjord: Fair and accurate federated learning under heterogeneous targets with ordered dropout. Advances in Neural Information Processing Systems, 34:12876–12889, 2021.
- Incremental layer-wise self-supervised learning for efficient speech domain adaptation on device. arXiv preprint arXiv:2110.00155, 2021.
- Revisiting self-supervised visual representation learning. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 1920–1929, 2019.
- Federated optimization: Distributed machine learning for on-device intelligence. arXiv preprint arXiv:1610.02527, 2016.
- Learning multiple layers of features from tiny images. Technical report, University of Toronto, 2009, 2009.
- Alex Krizhevsky. One weird trick for parallelizing convolutional neural networks. arXiv preprint arXiv:1404.5997, 2014.
- Ya Le and Xuan Yang. Tiny imagenet visual recognition challenge. CS 231N, 7(7):3, 2015.
- Model-contrastive federated learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 10713–10722, June 2021.
- Mocosfl: Enabling cross-client collaborative self-supervised learning. In The Eleventh International Conference on Learning Representations, 2023.
- Communication-efficient learning of deep networks from decentralized data. In Artificial intelligence and statistics, pages 1273–1282. PMLR, 2017.
- PyTorch: An Imperative Style, High-Performance Deep Learning Library. In H. Wallach, H. Larochelle, A. Beygelzimer, F. d’Alché Buc, E. Fox, and R. Garnett, editors, Advances in Neural Information Processing Systems 32, pages 8024–8035. Curran Associates, Inc., 2019.
- Fedaux: Leveraging unlabeled auxiliary data in federated learning. IEEE Transactions on Neural Networks and Learning Systems, 2021.
- Representation learning with contrastive predictive coding. arXiv e-prints, pages arXiv–1807, 2018.
- Attention is all you need. Advances in neural information processing systems, 30, 2017.
- Progfed: Effective, communication, and computation efficient federated learning by progressive training. In International Conference on Machine Learning, pages 23034–23054. PMLR, 2022.
- Does learning from decentralized non-IID unlabeled data benefit from self supervision? In The Eleventh International Conference on Learning Representations, 2023.
- Collaborative unsupervised visual representation learning from decentralized data. In Proceedings of the IEEE/CVF international conference on computer vision, pages 4912–4921, 2021.