Understanding the Tradeoffs in Client-side Privacy for Downstream Speech Tasks (2101.08919v2)

Published 22 Jan 2021 in eess.AS, cs.CR, and cs.SD

Abstract: As users increasingly rely on cloud-based computing services, it is important to ensure that uploaded speech data remains private. Existing solutions rely either on server-side methods or focus on hiding speaker identity. While these approaches reduce certain security concerns, they do not give users client-side control over whether their biometric information is sent to the server. In this paper, we formally define client-side privacy and discuss its three unique technical challenges: (1) direct manipulation of raw data on client devices, (2) adaptability with a broad range of server-side processing models, and (3) low time and space complexity for compatibility with limited-bandwidth devices. Solving these challenges requires new models that achieve high-fidelity reconstruction, privacy preservation of sensitive personal attributes, and efficiency during training and inference. As a step towards client-side privacy for speech recognition, we investigate three techniques spanning signal processing, disentangled representation learning, and adversarial training. Through a series of gender and accent masking tasks, we observe that each method has its unique strengths, but none manage to effectively balance the trade-offs between performance, privacy, and complexity. These insights call for more research in client-side privacy to ensure a safer deployment of cloud-based speech processing services.

Citations (7)

View on Semantic Scholar