Papers
Topics
Authors
Recent
Detailed Answer
Quick Answer
Concise responses based on abstracts only
Detailed Answer
Well-researched responses based on abstracts and relevant paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses
Gemini 2.5 Flash
Gemini 2.5 Flash 48 tok/s
Gemini 2.5 Pro 48 tok/s Pro
GPT-5 Medium 26 tok/s Pro
GPT-5 High 19 tok/s Pro
GPT-4o 107 tok/s Pro
Kimi K2 205 tok/s Pro
GPT OSS 120B 473 tok/s Pro
Claude Sonnet 4 38 tok/s Pro
2000 character limit reached

Characterizing and Optimizing End-to-End Systems for Private Inference (2207.07177v2)

Published 14 Jul 2022 in cs.CR

Abstract: In two-party machine learning prediction services, the client's goal is to query a remote server's trained machine learning model to perform neural network inference in some application domain. However, sensitive information can be obtained during this process by either the client or the server, leading to potential collection, unauthorized secondary use, and inappropriate access to personal information. These security concerns have given rise to Private Inference (PI), in which both the client's personal data and the server's trained model are kept confidential. State-of-the-art PI protocols consist of a pre-processing or offline phase and an online phase that combine several cryptographic primitives: Homomorphic Encryption (HE), Secret Sharing (SS), Garbled Circuits (GC), and Oblivious Transfer (OT). Despite the need and recent performance improvements, PI remains largely arcane today and is too slow for practical use. This paper addresses PI's shortcomings with a detailed characterization of a standard high-performance protocol to build foundational knowledge and intuition in the systems community. Our characterization pinpoints all sources of inefficiency -- compute, communication, and storage. In contrast to prior work, we consider inference request arrival rates rather than studying individual inferences in isolation and we find that the pre-processing phase cannot be ignored and is often incurred online as there is insufficient downtime to hide pre-compute latency. Finally, we leverage insights from our characterization and propose three optimizations to address the storage (Client-Garbler), computation (layer-parallel HE), and communication (wireless slot allocation) overheads. Compared to the state-of-the-art PI protocol, these optimizations provide a total PI speedup of 1.8$\times$ with the ability to sustain inference requests up to a 2.24$\times$ greater rate.

Citations (21)

Summary

We haven't generated a summary for this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

Lightbulb On Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.