Emergent Mind

Communication-Efficient Personalized Federated Learning for Speech-to-Text Tasks

(2401.10070)
Published Jan 18, 2024 in cs.CL , cs.SD , and eess.AS

Abstract

To protect privacy and meet legal regulations, federated learning (FL) has gained significant attention for training speech-to-text (S2T) systems, including automatic speech recognition (ASR) and speech translation (ST). However, the commonly used FL approach (i.e., \textsc{FedAvg}) in S2T tasks typically suffers from extensive communication overhead due to multi-round interactions based on the whole model and performance degradation caused by data heterogeneity among clients.To address these issues, we propose a personalized federated S2T framework that introduces \textsc{FedLoRA}, a lightweight LoRA module for client-side tuning and interaction with the server to minimize communication overhead, and \textsc{FedMem}, a global model equipped with a $k$-nearest-neighbor ($k$NN) classifier that captures client-specific distributional shifts to achieve personalization and overcome data heterogeneity. Extensive experiments based on Conformer and Whisper backbone models on CoVoST and GigaSpeech benchmarks show that our approach significantly reduces the communication overhead on all S2T tasks and effectively personalizes the global model to overcome data heterogeneity.

Overview

  • The paper discusses a personalized federated learning framework for speech-to-text tasks that addresses privacy concerns.

  • FedLoRA, the first stage of the framework, reduces communication overhead by optimizing a smaller module of the model.

  • The second stage, FedMem, enhances personalization by using a k-nearest-neighbor classifier to tailor the global model to client-specific data.

  • The framework was tested on two benchmark datasets and showed up to a 96.5% reduction in communication overhead while maintaining or improving performance.

  • The paper suggests the combined approach of efficient training and personalization can reduce bandwidth needs without compromising accuracy.

Overview of Personalized Federated Learning for S2T

Federated learning (FL) is highly relevant to speech-to-text (S2T) tasks for preserving privacy and complying with legal standards. This learning method enables the collaborative training of a global model without sharing private client data, useful in applications such as automatic speech recognition (ASR) and speech translation (ST). However, FL faces the dual challenges of extensive communication overhead and a decline in performance due to data differences among clients. Addressing these issues, the paper introduces a new efficient and personalized FL framework for S2T tasks that incorporates two innovative strategies: FedLoRA and FedMem.

FedLoRA and FedMem: The Two-Stage Solution

The proposed FL framework operates in two stages. The first stage, named FedLoRA, focuses on reducing communication overhead by freezing the complex parts of the model and optimizing a lightweight module known as Low-Rank Adaptation (LoRA) for tuning. By only interacting with the server using this smaller module, communication and computational demands are significantly lessened.

The second stage, termed FedMem, introduces a k-nearest-neighbor (kNN) classifier to the global model for capturing client-specific speech data characteristics. This provides a level of personalization that mitigates performance issues due to data heterogeneity amongst clients. It does so by memorizing key representations of the client's data, enabling the global model to retrieve tailored information during inference, leading to more accurate results.

Experimental Validation

The proposed framework's efficacy is demonstrated through experiments on two benchmark datasets—CoVoST, reflecting dialect variations, and GigaSpeech, representing multi-domain speech data. The results show that the FedLoRA substantially reduces communication overhead by up to 96.5% while maintaining competitive or superior model performance compared to centralized training. In addition, it's shown that incorporating FedMem can enhance model personalization, offering improved resilience to data distribution disparities across different clients.

Conclusion and Future Work

This paper makes a compelling case for a new personalized FL framework tailored for S2T tasks. By combining efficient model training with personalization frameworks, it's possible to significantly cut down on bandwidth requirements without sacrificing accuracy. The method's ability to maintain performance while reducing communication load makes it an attractive option in the FL field. Future work could explore alternative memorization-retrieval methods for further optimization of inference speed and performance.

Create an account to read this summary for free:

Newsletter

Get summaries of trending comp sci papers delivered straight to your inbox:

Unsubscribe anytime.