Federated Non-negative Matrix Factorization for Short Texts Topic Modeling with Mutual Information (2205.13300v1)

Published 26 May 2022 in cs.CL, cs.AI, and cs.LG

Abstract: Non-negative matrix factorization (NMF) based topic modeling is widely used in NLP to uncover hidden topics of short text documents. Usually, training a high-quality topic model requires large amount of textual data. In many real-world scenarios, customer textual data should be private and sensitive, precluding uploading to data centers. This paper proposes a Federated NMF (FedNMF) framework, which allows multiple clients to collaboratively train a high-quality NMF based topic model with locally stored data. However, standard federated learning will significantly undermine the performance of topic models in downstream tasks (e.g., text classification) when the data distribution over clients is heterogeneous. To alleviate this issue, we further propose FedNMF+MI, which simultaneously maximizes the mutual information (MI) between the count features of local texts and their topic weight vectors to mitigate the performance degradation. Experimental results show that our FedNMF+MI methods outperform Federated Latent Dirichlet Allocation (FedLDA) and the FedNMF without MI methods for short texts by a significant margin on both coherence score and classification F1 score.

Citations (10)

View on Semantic Scholar

Summary

We haven't generated a summary for this paper yet.

Summarize Now

Federated Non-negative Matrix Factorization for Short Texts Topic Modeling with Mutual Information (2205.13300v1)

Summary

Related Papers