Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Open-Vocabulary Federated Learning with Multimodal Prototyping (2404.01232v2)

Published 1 Apr 2024 in cs.CL and cs.CV

Abstract: Existing federated learning (FL) studies usually assume the training label space and test label space are identical. However, in real-world applications, this assumption is too ideal to be true. A new user could come up with queries that involve data from unseen classes, and such open-vocabulary queries would directly defect such FL systems. Therefore, in this work, we explicitly focus on the under-explored open-vocabulary challenge in FL. That is, for a new user, the global server shall understand her/his query that involves arbitrary unknown classes. To address this problem, we leverage the pre-trained vision-LLMs (VLMs). In particular, we present a novel adaptation framework tailored for VLMs in the context of FL, named as Federated Multimodal Prototyping (Fed-MP). Fed-MP adaptively aggregates the local model weights based on light-weight client residuals, and makes predictions based on a novel multimodal prototyping mechanism. Fed-MP exploits the knowledge learned from the seen classes, and robustifies the adapted VLM to unseen categories. Our empirical evaluation on various datasets validates the effectiveness of Fed-MP.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (31)
  1. Food-101–mining discriminative components with random forests. In Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part VI 13, pages 446–461. Springer.
  2. Predicting visual exemplars of unseen classes for zero-shot learning. In Proceedings of the IEEE international conference on computer vision, pages 3476–3485.
  3. Multimodal federated learning: A survey. Sensors, 23(15):6986.
  4. Feddat: An approach for foundation model finetuning in multi-modal heterogeneous federated learning. arXiv preprint arXiv:2308.12305.
  5. Imagenet: A large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition, pages 248–255. Ieee.
  6. Learning generative visual models from few training examples: An incremental bayesian approach tested on 101 object categories. In 2004 conference on computer vision and pattern recognition workshop, pages 178–178. IEEE.
  7. pfedprompt: Learning personalized prompt for vision-language models in federated learning. In Proceedings of the ACM Web Conference 2023, pages 1364–1374.
  8. Promptfl: Let federated participants cooperatively learn prompts instead of models-federated learning in age of foundation model. IEEE Transactions on Mobile Computing.
  9. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778.
  10. Safe-student for safe deep semi-supervised learning with unseen-class unlabeled data. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 14585–14594.
  11. Yusuke Iwasawa and Yutaka Matsuo. 2021. Test-time classifier adjustment module for model-agnostic domain generalization. Advances in Neural Information Processing Systems, 34:2427–2440.
  12. Harmofl: Harmonizing local and global drifts in federated learning on heterogeneous medical images. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 36, pages 1087–1095.
  13. 3d object representations for fine-grained categorization. In Proceedings of the IEEE international conference on computer vision workshops, pages 554–561.
  14. Unseen classes at a later time? no problem. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 9245–9254.
  15. Visual prompt based personalized federated learning. arXiv preprint arXiv:2303.08678.
  16. Federated learning meets natural language processing: A survey. arXiv preprint arXiv:2107.12603.
  17. Fedvision: An online visual object detection platform powered by federated learning. In Proceedings of the AAAI conference on artificial intelligence, volume 34, pages 13172–13179.
  18. Fedclip: Fast generalization and personalization for clip in federated learning. arXiv preprint arXiv:2302.13485.
  19. Fine-grained visual classification of aircraft. arXiv preprint arXiv:1306.5151.
  20. Communication-efficient learning of deep networks from decentralized data. In Artificial intelligence and statistics, pages 1273–1282. PMLR.
  21. Maria-Elena Nilsback and Andrew Zisserman. 2008. Automated flower classification over a large number of classes. In 2008 Sixth Indian conference on computer vision, graphics & image processing, pages 722–729. IEEE.
  22. Text-driven prompt generation for vision-language models in federated learning. arXiv preprint arXiv:2310.06123.
  23. Learning transferable visual models from natural language supervision. In International conference on machine learning, pages 8748–8763. PMLR.
  24. Unseen class discovery in open-world classification. arXiv preprint arXiv:1801.05609.
  25. Ucf101: A dataset of 101 human actions classes from videos in the wild. arXiv preprint arXiv:1212.0402.
  26. Feature distribution matching for federated domain generalization. In Asian Conference on Machine Learning, pages 942–957. PMLR.
  27. Closing the generalization gap of cross-silo federated medical image segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 20866–20875.
  28. Efficient model personalization in federated learning via client-specific prompt generation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 19159–19168.
  29. Task residual for tuning vision-language models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 10899–10909.
  30. Federated domain generalization with generalization adjustment. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 3954–3963.
  31. Conditional prompt learning for vision-language models. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 16816–16825.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
  1. Huimin Zeng (25 papers)
  2. Zhenrui Yue (24 papers)
  3. Dong Wang (628 papers)

Summary

We haven't generated a summary for this paper yet.