LLMs Can Understand Encrypted Prompt: Towards Privacy-Computing Friendly Transformers (2305.18396v3)
Abstract: The community explored to build private inference frameworks for transformer-based LLMs in a server-client setting, where the server holds the model parameters and the client inputs its private data (or prompt) for inference. However, these frameworks impose significant overhead when the private inputs are forward propagated through the original LLMs. In this paper, we show that substituting the computation- and communication-heavy operators in the transformer architecture with privacy-computing friendly approximations can greatly reduce the private inference costs while incurring very minor impact on model performance. Compared to state-of-the-art Iron (NeurIPS 2022), our privacy-computing friendly model inference pipeline achieves a $5\times$ acceleration in computation and an 80% reduction in communication overhead, while retaining nearly identical accuracy.
- Layer normalization. arXiv preprint arXiv:1607.06450, 2016.
- A full rns variant of fv like somewhat homomorphic encryption schemes. pages 423–442, 10 2017.
- Generalization in nli: Ways (not) to go beyond simple heuristics, 2021.
- Language models are few-shot learners, 2020.
- THE-X: Privacy-preserving transformer inference with homomorphic encryption. In Findings of the Association for Computational Linguistics: ACL 2022, pages 3510–3520, Dublin, Ireland, May 2022. Association for Computational Linguistics.
- Spdz2k: Efficient mpc mod 2 for dishonest majority. 2022.
- Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805, 2018.
- Cryptonets: Applying neural networks to encrypted data with high throughput and accuracy. In Proceedings of the 33rd International Conference on International Conference on Machine Learning - Volume 48, ICML’16, page 201–210. JMLR.org, 2016.
- Somewhat practical fully homomorphic encryption. IACR Cryptol. ePrint Arch., 2012:144, 2012.
- Kunihiko Fukushima. Cognitron: A self-organizing multilayered neural network. Biological cybernetics, 20(3-4):121–136, 1975.
- Iron: Private inference on transformers. In S. Koyejo, S. Mohamed, A. Agarwal, D. Belgrave, K. Cho, and A. Oh, editors, Advances in Neural Information Processing Systems, volume 35, pages 15718–15731. Curran Associates, Inc., 2022.
- Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016.
- Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415, 2016.
- Cheetah: Lean and fast secure Two-Party deep neural network inference. In 31st USENIX Security Symposium (USENIX Security 22), pages 809–826, Boston, MA, August 2022. USENIX Association.
- Batch normalization: Accelerating deep network training by reducing internal covariate shift. In International conference on machine learning, pages 448–456. pmlr, 2015.
- Extending oblivious transfers efficiently. In Dan Boneh, editor, Advances in Cryptology - CRYPTO 2003, pages 145–161, Berlin, Heidelberg, 2003. Springer Berlin Heidelberg.
- GAZELLE: A low latency framework for secure neural network inference. In 27th USENIX Security Symposium (USENIX Security 18), pages 1651–1669, Baltimore, MD, August 2018. USENIX Association.
- Actively secure ot extension with optimal overhead. In Rosario Gennaro and Matthew Robshaw, editors, Advances in Cryptology – CRYPTO 2015, pages 724–741, Berlin, Heidelberg, 2015. Springer Berlin Heidelberg.
- Crypten: Secure multi-party computation meets machine learning. In M. Ranzato, A. Beygelzimer, Y. Dauphin, P.S. Liang, and J. Wortman Vaughan, editors, Advances in Neural Information Processing Systems, volume 34, pages 4961–4973. Curran Associates, Inc., 2021.
- Block pruning for faster transformers. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 10619–10629, Online and Punta Cana, Dominican Republic, November 2021. Association for Computational Linguistics.
- Mpcformer: fast, performant and private transformer inference with mpc. arXiv preprint arXiv:2211.01452, 2022.
- martFL: Enabling Utility-Driven Data Marketplace with a Robust and Verifiable Federated Learning Architecture. In Proceedings of the 2023 ACM SIGSAC Conference on Computer and Communications Security, 2023.
- Efficient 3PC for binary circuits with application to Maliciously-Secure DNN inference. In 32nd USENIX Security Symposium (USENIX Security 23), pages 5377–5394, Anaheim, CA, August 2023. USENIX Association.
- Oblivious neural network predictions via minionn transformations. In Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security, CCS ’17, page 619–631, New York, NY, USA, 2017. Association for Computing Machinery.
- Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692, 2019.
- Make Web3. 0 Connected. IEEE transactions on dependable and secure computing, 2022.
- Are sixteen heads really better than one? In H. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alché-Buc, E. Fox, and R. Garnett, editors, Advances in Neural Information Processing Systems, volume 32. Curran Associates, Inc., 2019.
- Delphi: A cryptographic inference system for neural networks. In Proceedings of the 2020 Workshop on Privacy-Preserving Machine Learning in Practice, PPMLP’20, page 27–30, New York, NY, USA, 2020. Association for Computing Machinery.
- Efficient oblivious transfer protocols. In ACM-SIAM Symposium on Discrete Algorithms, 2001.
- OpenAI. Gpt-4 technical report, 2023.
- Language models are unsupervised multitask learners. OpenAI blog, 1(8):9, 2019.
- Cryptflow2: Practical 2-party secure inference. In Proceedings of the 2020 ACM SIGSAC Conference on Computer and Communications Security, CCS ’20, page 325–342, New York, NY, USA, 2020. Association for Computing Machinery.
- Adi Shamir. How to share a secret. Communications of the ACM, 22(11):612–613, 1979.
- Patient knowledge distillation for BERT model compression. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 4323–4332, Hong Kong, China, November 2019. Association for Computational Linguistics.
- Cryptgpu: Fast privacy-preserving machine learning on the gpu. In 2021 IEEE Symposium on Security and Privacy (SP), pages 1021–1038, 2021.
- Well-read students learn better: The impact of student initialization on knowledge distillation. CoRR, abs/1908.08962, 2019.
- Attention is all you need. Advances in neural information processing systems, 30, 2017.
- GLUE: A multi-task benchmark and analysis platform for natural language understanding. In Proceedings of the 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP, pages 353–355, Brussels, Belgium, November 2018. Association for Computational Linguistics.
- Ferret: Fast extension for correlated ot with small communication. In Proceedings of the 2020 ACM SIGSAC Conference on Computer and Communications Security, CCS ’20, page 1607–1626, New York, NY, USA, 2020. Association for Computing Machinery.
- A survey of intel sgx and its applications. Frontiers of Computer Science, 15(3):153808, Dec 2020.
- Ppmlac: High performance chipset architecture for secure multi-party computation. In Proceedings of the 49th Annual International Symposium on Computer Architecture, ISCA ’22, page 87–101, New York, NY, USA, 2022. Association for Computing Machinery.
- Xuanqi Liu (3 papers)
- Zhuotao Liu (21 papers)