Papers
Topics
Authors
Recent
Detailed Answer
Quick Answer
Concise responses based on abstracts only
Detailed Answer
Well-researched responses based on abstracts and relevant paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses
Gemini 2.5 Flash
Gemini 2.5 Flash 62 tok/s
Gemini 2.5 Pro 48 tok/s Pro
GPT-5 Medium 17 tok/s Pro
GPT-5 High 13 tok/s Pro
GPT-4o 101 tok/s Pro
Kimi K2 217 tok/s Pro
GPT OSS 120B 474 tok/s Pro
Claude Sonnet 4 36 tok/s Pro
2000 character limit reached

High Epsilon Synthetic Data Vulnerabilities in MST and PrivBayes (2402.06699v1)

Published 9 Feb 2024 in cs.CR

Abstract: Synthetic data generation (SDG) has become increasingly popular as a privacy-enhancing technology. It aims to maintain important statistical properties of its underlying training data, while excluding any personally identifiable information. There have been a whole host of SDG algorithms developed in recent years to improve and balance both of these aims. Many of these algorithms provide robust differential privacy guarantees. However, we show here that if the differential privacy parameter $\varepsilon$ is set too high, then unambiguous privacy leakage can result. We show this by conducting a novel membership inference attack (MIA) on two state-of-the-art differentially private SDG algorithms: MST and PrivBayes. Our work suggests that there are vulnerabilities in these generators not previously seen, and that future work to strengthen their privacy is advisable. We present the heuristic for our MIA here. It assumes knowledge of auxiliary "population" data, and also assumes knowledge of which SDG algorithm was used. We use this information to adapt the recent DOMIAS MIA uniquely to MST and PrivBayes. Our approach went on to win the SNAKE challenge in November 2023.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (13)
  1. SNAKE Challenge: Sanitization Algorithms under Attack. In Proceedings of the 32nd ACM International Conference on Information and Knowledge Management, 5010–5014.
  2. Block neural autoregressive flow. In Uncertainty in Artificial Intelligence, 1263–1273. PMLR.
  3. The algorithmic foundations of differential privacy. Foundations and Trends® in Theoretical Computer Science, 9(3–4): 211–407.
  4. Logan: Membership inference attacks against generative models. Proceedings on Privacy Enhancing Technologies.
  5. Monte Carlo and Reconstruction Membership Inference Attacks against Generative Models. Proceedings on Privacy Enhancing Technologies, 2019(4): 232–249.
  6. TAPAS: A toolbox for adversarial privacy auditing of synthetic data. arXiv preprint arXiv:2211.06550.
  7. Synthetic Data – what, why and how? arXiv:2205.03257.
  8. Winning the NIST Contest: A scalable and general approach to differentially private synthetic data. arXiv preprint arXiv:2108.04978.
  9. Scott, D. W. 2015. Multivariate density estimation: theory, practice, and visualization. John Wiley & Sons.
  10. Membership Inference Attacks Against Machine Learning Models. In 2017 IEEE Symposium on Security and Privacy (SP), 3–18.
  11. Membership inference attacks against synthetic data through overfitting detection. AISTATS.
  12. Privacy risk in machine learning: Analyzing the connection to overfitting. In 2018 IEEE 31st computer security foundations symposium (CSF), 268–282. IEEE.
  13. Privbayes: Private data release via bayesian networks. ACM Transactions on Database Systems (TODS), 42(4): 1–41.

Summary

We haven't generated a summary for this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

Lightbulb On Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.