Papers
Topics
Authors
Recent
Detailed Answer
Quick Answer
Concise responses based on abstracts only
Detailed Answer
Well-researched responses based on abstracts and relevant paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses
Gemini 2.5 Flash
Gemini 2.5 Flash 37 tok/s
Gemini 2.5 Pro 41 tok/s Pro
GPT-5 Medium 10 tok/s Pro
GPT-5 High 15 tok/s Pro
GPT-4o 84 tok/s Pro
Kimi K2 198 tok/s Pro
GPT OSS 120B 448 tok/s Pro
Claude Sonnet 4 31 tok/s Pro
2000 character limit reached

Private sampling: a noiseless approach for generating differentially private synthetic data (2109.14839v1)

Published 30 Sep 2021 in cs.CR, math.ST, and stat.TH

Abstract: In a world where artificial intelligence and data science become omnipresent, data sharing is increasingly locking horns with data-privacy concerns. Differential privacy has emerged as a rigorous framework for protecting individual privacy in a statistical database, while releasing useful statistical information about the database. The standard way to implement differential privacy is to inject a sufficient amount of noise into the data. However, in addition to other limitations of differential privacy, this process of adding noise will affect data accuracy and utility. Another approach to enable privacy in data sharing is based on the concept of synthetic data. The goal of synthetic data is to create an as-realistic-as-possible dataset, one that not only maintains the nuances of the original data, but does so without risk of exposing sensitive information. The combination of differential privacy with synthetic data has been suggested as a best-of-both-worlds solutions. In this work, we propose the first noisefree method to construct differentially private synthetic data; we do this through a mechanism called "private sampling". Using the Boolean cube as benchmark data model, we derive explicit bounds on accuracy and privacy of the constructed synthetic data. The key mathematical tools are hypercontractivity, duality, and empirical processes. A core ingredient of our private sampling mechanism is a rigorous "marginal correction" method, which has the remarkable property that importance reweighting can be utilized to exactly match the marginals of the sample to the marginals of the population.

Citations (13)

Summary

We haven't generated a summary for this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

Lightbulb On Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.