Open-Vocabulary Federated Learning with Multimodal Prototyping (2404.01232v2)

Published 1 Apr 2024 in cs.CL and cs.CV

Abstract: Existing federated learning (FL) studies usually assume the training label space and test label space are identical. However, in real-world applications, this assumption is too ideal to be true. A new user could come up with queries that involve data from unseen classes, and such open-vocabulary queries would directly defect such FL systems. Therefore, in this work, we explicitly focus on the under-explored open-vocabulary challenge in FL. That is, for a new user, the global server shall understand her/his query that involves arbitrary unknown classes. To address this problem, we leverage the pre-trained vision-LLMs (VLMs). In particular, we present a novel adaptation framework tailored for VLMs in the context of FL, named as Federated Multimodal Prototyping (Fed-MP). Fed-MP adaptively aggregates the local model weights based on light-weight client residuals, and makes predictions based on a novel multimodal prototyping mechanism. Fed-MP exploits the knowledge learned from the seen classes, and robustifies the adapted VLM to unseen categories. Our empirical evaluation on various datasets validates the effectiveness of Fed-MP.

References (31)

Authors (3)

Huimin Zeng (25 papers)
Zhenrui Yue (24 papers)
Dong Wang (628 papers)

Summary

We haven't generated a summary for this paper yet.

Summarize Now

Open-Vocabulary Federated Learning with Multimodal Prototyping (2404.01232v2)

Summary

Related Papers