VPAS: Publicly Verifiable and Privacy-Preserving Aggregate Statistics on Distributed Datasets (2403.15208v1)
Abstract: Aggregate statistics play an important role in extracting meaningful insights from distributed data while preserving privacy. A growing number of application domains, such as healthcare, utilize these statistics in advancing research and improving patient care. In this work, we explore the challenge of input validation and public verifiability within privacy-preserving aggregation protocols. We address the scenario in which a party receives data from multiple sources and must verify the validity of the input and correctness of the computations over this data to third parties, such as auditors, while ensuring input data privacy. To achieve this, we propose the "VPAS" protocol, which satisfies these requirements. Our protocol utilizes homomorphic encryption for data privacy, and employs Zero-Knowledge Proofs (ZKP) and a blockchain system for input validation and public verifiability. We constructed VPAS by extending existing verifiable encryption schemes into secure protocols that enable N clients to encrypt, aggregate, and subsequently release the final result to a collector in a verifiable manner. We implemented and experimentally evaluated VPAS with regard to encryption costs, proof generation, and verification. The findings indicate that the overhead associated with verifiability in our protocol is 10x lower than that incurred by simply using conventional zkSNARKs. This enhanced efficiency makes it feasible to apply input validation with public verifiability across a wider range of applications or use cases that can tolerate moderate computational overhead associated with proof generation.
- Blockchain for genomics: a systematic literature review. Distributed Ledger Technologies: Research and Practice 1, 2 (2022), 1–28.
- arkworks. 2023. arkworks. https://github.com/arkworks-rs Accessed: 2023-11-01.
- Paulo SLM Barreto and Michael Naehrig. 2005. Pairing-friendly elliptic curves of prime order. In International workshop on selected areas in cryptography. Springer, 319–331.
- Guidelines for genome-wide association studies. PLoS genetics 8, 7 (2012), e1002812.
- SMCQL: secure querying for federated databases. arXiv preprint arXiv:1606.06808 (2016).
- {{\{{ACORN}}\}}: Input Validation for Secure Aggregation. In 32nd USENIX Security Symposium (USENIX Security 23). 4805–4822.
- From extractable collision resistance to succinct non-interactive arguments of knowledge, and back again. In Proceedings of the 3rd Innovations in Theoretical Computer Science Conference. 326–349.
- Non-interactive zero-knowledge and its applications. In Providing Sound Foundations for Cryptography: On the Work of Shafi Goldwasser and Silvio Micali. 329–349.
- Students and taxes: a privacy-preserving social study using secure computation. Cryptology ePrint Archive (2015).
- Sharemind: A framework for fast privacy-preserving computations. In Computer Security-ESORICS 2008: 13th European Symposium on Research in Computer Security, Málaga, Spain, October 6-8, 2008. Proceedings 13. Springer, 192–206.
- Balancing privacy and accountability in digital payment methods using zk-SNARKs. In 2022 19th Annual International Conference on Privacy, Security & Trust (PST). IEEE, 1–10.
- Verifiable Privacy-Preserving Computing. arXiv preprint arXiv:2309.08248 (2023).
- Jan Camenisch and Markus Stadler. 1997. Proof systems for general statements about discrete logarithms. Technical Report/ETH Zurich, Department of Computer Science 260 (1997).
- A systematic literature review of individuals’ perspectives on privacy and genetic information in the United States. PLOS ONE 13, 10 (Oct. 2018), e0204417. https://doi.org/10.1371/journal.pone.0204417
- Henry Corrigan-Gibbs and Dan Boneh. 2017. Prio: Private, robust, and scalable computation of aggregate statistics. In 14th USENIX symposium on networked systems design and implementation (NSDI 17). 259–282.
- A secure and optimally efficient multi-authority election scheme. European transactions on Telecommunications 8, 5 (1997), 481–490.
- Verifiable Distributed Aggregation Functions. Cryptology ePrint Archive (2023).
- Federated discovery and sharing of genomic data using Beacons. Nature biotechnology 37, 3 (2019), 220–224.
- UnLynx: A Decentralized System for Privacy-Conscious Data Sharing. Proc. Priv. Enhancing Technol. 2017, 4 (2017), 232–250.
- Drynx: Decentralized, secure, verifiable system for statistical queries and machine learning on distributed datasets. IEEE Transactions on Information Forensics and Security 15 (2020), 3035–3050.
- Snarkpack: Practical snark aggregation. In International Conference on Financial Cryptography and Data Security. Springer, 203–229.
- Privacy-preserving distributed linear regression on high-dimensional data. Cryptology ePrint Archive (2016).
- Jens Groth. 2016. On the size of pairing-based non-interactive arguments. In Advances in Cryptology–EUROCRYPT 2016: 35th Annual International Conference on the Theory and Applications of Cryptographic Techniques, Vienna, Austria, May 8-12, 2016, Proceedings, Part II 35. Springer, 305–326.
- Privacy-Preserving and Efficient Verification of the Outcome in Genome-Wide Association Studies. Proceedings on Privacy Enhancing Technologies 3 (2022), 732–753.
- iden3. 2023. CircomLib. Accessed: 2023-03-14.
- Deriving genomic diagnoses without revealing patient genomes. Science 357, 6352 (2017), 692–695.
- Reproducibility-Oriented and Privacy-Preserving Genomic Dataset Sharing. (2023). arXiv:2209.06327 [cs.CR]
- Private computation on encrypted genomic data. In International Conference on Cryptology and Information Security in Latin America. Springer, 3–27.
- SAVER: SNARK-friendly, Additively-homomorphic, and Verifiable Encryption and decryption with Rerandomization. Cryptology ePrint Archive (2019).
- Mona: Secure multi-owner data sharing for dynamic groups in the cloud. IEEE transactions on parallel and distributed systems 24, 6 (2012), 1182–1191.
- Efficient private statistics with succinct sketches. arXiv preprint arXiv:1508.06110 (2015).
- CryptDB: Protecting confidentiality with encrypted query processing. In Proceedings of the twenty-third ACM symposium on operating systems principles. 85–100.
- Beacon v2 and Beacon networks: a “lingua franca” for federated data discovery in biomedical genomics, and beyond. Human mutation 43, 6 (2022), 791–799.
- Elsa: Secure aggregation for federated learning with malicious actors. In 2023 IEEE Symposium on Security and Privacy (SP). IEEE, 1961–1979.
- Daniel Shanks. 1971. Class number, a theory of factorization, and genera. In Proc. Symp. Math. Soc., 1971, Vol. 20. 415–440.
- Poster: Privacy-preserving Genome Analysis using Verifiable Off-Chain Computation. In Proceedings of the 2022 ACM SIGSAC Conference on Computer and Communications Security. 3475–3477.
- Baby Jubjub elliptic curve. Ethereum Improvement Proposal, EIP-2494 29 (2020).
- Scalable anonymous group communication in the anytrust model. In European Workshop on System Security (EuroSec), Vol. 4.
- Private information retrieval for secure distributed storage systems. IEEE Transactions on Information Forensics and Security 13, 12 (2018), 2953–2964.
- Lei Yang and Fengjun Li. 2013. Detecting false data injection in smart grid in-network aggregation. In 2013 IEEE international conference on smart grid communications (SmartGridComm). IEEE, 408–413.