Open-Vocabulary Federated Learning with Multimodal Prototyping (2404.01232v2)
Abstract: Existing federated learning (FL) studies usually assume the training label space and test label space are identical. However, in real-world applications, this assumption is too ideal to be true. A new user could come up with queries that involve data from unseen classes, and such open-vocabulary queries would directly defect such FL systems. Therefore, in this work, we explicitly focus on the under-explored open-vocabulary challenge in FL. That is, for a new user, the global server shall understand her/his query that involves arbitrary unknown classes. To address this problem, we leverage the pre-trained vision-LLMs (VLMs). In particular, we present a novel adaptation framework tailored for VLMs in the context of FL, named as Federated Multimodal Prototyping (Fed-MP). Fed-MP adaptively aggregates the local model weights based on light-weight client residuals, and makes predictions based on a novel multimodal prototyping mechanism. Fed-MP exploits the knowledge learned from the seen classes, and robustifies the adapted VLM to unseen categories. Our empirical evaluation on various datasets validates the effectiveness of Fed-MP.
- Food-101–mining discriminative components with random forests. In Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part VI 13, pages 446–461. Springer.
- Predicting visual exemplars of unseen classes for zero-shot learning. In Proceedings of the IEEE international conference on computer vision, pages 3476–3485.
- Multimodal federated learning: A survey. Sensors, 23(15):6986.
- Feddat: An approach for foundation model finetuning in multi-modal heterogeneous federated learning. arXiv preprint arXiv:2308.12305.
- Imagenet: A large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition, pages 248–255. Ieee.
- Learning generative visual models from few training examples: An incremental bayesian approach tested on 101 object categories. In 2004 conference on computer vision and pattern recognition workshop, pages 178–178. IEEE.
- pfedprompt: Learning personalized prompt for vision-language models in federated learning. In Proceedings of the ACM Web Conference 2023, pages 1364–1374.
- Promptfl: Let federated participants cooperatively learn prompts instead of models-federated learning in age of foundation model. IEEE Transactions on Mobile Computing.
- Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778.
- Safe-student for safe deep semi-supervised learning with unseen-class unlabeled data. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 14585–14594.
- Yusuke Iwasawa and Yutaka Matsuo. 2021. Test-time classifier adjustment module for model-agnostic domain generalization. Advances in Neural Information Processing Systems, 34:2427–2440.
- Harmofl: Harmonizing local and global drifts in federated learning on heterogeneous medical images. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 36, pages 1087–1095.
- 3d object representations for fine-grained categorization. In Proceedings of the IEEE international conference on computer vision workshops, pages 554–561.
- Unseen classes at a later time? no problem. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 9245–9254.
- Visual prompt based personalized federated learning. arXiv preprint arXiv:2303.08678.
- Federated learning meets natural language processing: A survey. arXiv preprint arXiv:2107.12603.
- Fedvision: An online visual object detection platform powered by federated learning. In Proceedings of the AAAI conference on artificial intelligence, volume 34, pages 13172–13179.
- Fedclip: Fast generalization and personalization for clip in federated learning. arXiv preprint arXiv:2302.13485.
- Fine-grained visual classification of aircraft. arXiv preprint arXiv:1306.5151.
- Communication-efficient learning of deep networks from decentralized data. In Artificial intelligence and statistics, pages 1273–1282. PMLR.
- Maria-Elena Nilsback and Andrew Zisserman. 2008. Automated flower classification over a large number of classes. In 2008 Sixth Indian conference on computer vision, graphics & image processing, pages 722–729. IEEE.
- Text-driven prompt generation for vision-language models in federated learning. arXiv preprint arXiv:2310.06123.
- Learning transferable visual models from natural language supervision. In International conference on machine learning, pages 8748–8763. PMLR.
- Unseen class discovery in open-world classification. arXiv preprint arXiv:1801.05609.
- Ucf101: A dataset of 101 human actions classes from videos in the wild. arXiv preprint arXiv:1212.0402.
- Feature distribution matching for federated domain generalization. In Asian Conference on Machine Learning, pages 942–957. PMLR.
- Closing the generalization gap of cross-silo federated medical image segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 20866–20875.
- Efficient model personalization in federated learning via client-specific prompt generation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 19159–19168.
- Task residual for tuning vision-language models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 10899–10909.
- Federated domain generalization with generalization adjustment. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 3954–3963.
- Conditional prompt learning for vision-language models. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 16816–16825.
- Huimin Zeng (25 papers)
- Zhenrui Yue (24 papers)
- Dong Wang (628 papers)