Frequency Estimation Under Multiparty Differential Privacy: One-shot and Streaming
(2104.01808v2)
Published 5 Apr 2021 in cs.CR, cs.DS, and cs.LG
Abstract: We study the fundamental problem of frequency estimation under both privacy and communication constraints, where the data is distributed among $k$ parties. We consider two application scenarios: (1) one-shot, where the data is static and the aggregator conducts a one-time computation; and (2) streaming, where each party receives a stream of items over time and the aggregator continuously monitors the frequencies. We adopt the model of multiparty differential privacy (MDP), which is more general than local differential privacy (LDP) and (centralized) differential privacy. Our protocols achieve optimality (up to logarithmic factors) permissible by the more stringent of the two constraints. In particular, when specialized to the $\varepsilon$-LDP model, our protocol achieves an error of $\sqrt{k}/(e{\Theta(\varepsilon)}-1)$ using $O(k\max{ \varepsilon, \frac{1}{\varepsilon} })$ bits of communication and $O(k \log u)$ bits of public randomness, where $u$ is the size of the domain.
The paper introduces a multiparty differential privacy (MDP) model that bridges local and centralized privacy to enable accurate frequency estimation in distributed systems.
The study achieves optimal error rates in one-shot estimation with an error of √k/(e^(Θ(ε))-1) while also adapting the methodology for continuous streaming data using a sliding-window approach.
The proposed protocols significantly reduce communication and computational overhead, offering a scalable solution for privacy-preserving data aggregation in real-world applications.
Overview of Frequency Estimation Under Multiparty Differential Privacy
This paper addresses the issue of frequency estimation in distributed systems where privacy and communication constraints are paramount. Specifically, it tackles environments with k distinct parties where each party holds a dataset. The research is conducted under two primary scenarios: one where data is static (one-shot) and another where data is continuously streaming. Central to this investigation is the model of multiparty differential privacy (MDP), which serves as a more encompassing framework than local differential privacy (LDP) or centralized differential privacy (DP).
Research Contributions and Claims
MDP Model: The paper pioneers the consideration of frequency estimation under MDP, distinctively positioned between differing privacy paradigms such as LDP and centralized DP. By focusing on this model, the research enables more expansive and powerful privacy protection over distributed datasets.
Optimal Protocols: The authors introduce protocols that claim optimality within logarithmic factors under the binding constraints of privacy and communication. This optimality is particularly emphasized in scenarios where constraints are exceedingly stringent, demonstrating protocols that are adaptable while still providing strong privacy guarantees.
One-shot Scenario: For one-shot frequency estimation, the paper achieves an error rate of k/(eΘ(ε)−1) under the ε-LDP model. This result highlights an error proportional to the number of participants k, the privacy parameter ε, and logarithmic factors related to the frequency domain size u.
Streaming Scenario: In the streaming data context, where data arrives continuously, the paper employs the sliding-window model. The protocol allows constant monitoring, keeping track of item frequencies within the latest w time steps, while preserving privacy across all processes.
Space-Time Complexity: The paper claims efficiency not only in terms of error rates against privacy guarantees but also in reducing communication overhead and computational space, helping the scalability of distributed machine learning systems, such as federated learning.
Strong Results and Numerical Claims
The paper elucidates that their protocol for one-shot frequency estimation reaches optimality (up to constant factors) both in high-privacy regimes (ε=O(1)) and in settings where the adversary is passive.
When compared to LDP protocols, the communication cost is significantly reduced to O(kmax{ε,ε1}) bits, marking an exponential reduction in necessary resources relative to the degree of privacy protection.
Implications and Future Speculation
Theoretically, this work extends the privacy landscape by elevating MDP, linking it strongly to both local and centralized models, and opening pathways for further exploration of distributed privacy methodologies. Practically, these findings could enforce robust designs in privacy-preserving systems, applicable in healthcare data aggregation, secure financial reporting, and other privacy-critical domains.
Looking forward, the techniques and results could ignite interest in refining and ushering new distributed privacy models that minimize resource usage while maintaining accuracy and upholding stringent privacy requirements. Enhanced variants could directly impact the design and deployment of privacy-focused applications and platforms that handle sensitive information.
In summary, this paper contributes significantly to the sphere of privacy-preserving protocols in distributed systems, by providing both foundational insights and practical solutions to frequency estimation challenges under the stringent constraints of multiparty settings. As systems evolve toward more distributed architectures, the principles and methods herein may serve as building blocks for future innovations.