On Properties and Optimization of Information-theoretic Privacy Watchdog (2010.09367v1)

Published 19 Oct 2020 in cs.IT and math.IT

Abstract: We study the problem of privacy preservation in data sharing, where $S$ is a sensitive variable to be protected and $X$ is a non-sensitive useful variable correlated with $S$. Variable $X$ is randomized into variable $Y$, which will be shared or released according to $p_{Y|X}(y|x)$. We measure privacy leakage by \emph{information privacy} (also known as \emph{log-lift} in the literature), which guarantees mutual information privacy and differential privacy (DP). Let $\Xepsc \subseteq \X$ contain elements n the alphabet of $X$ for which the absolute value of log-lift (abs-log-lift for short) is greater than a desired threshold $\eps$. When elements $x\in \Xepsc$ are randomized into $y\in \Y$, we derive the best upper bound on the abs-log-lift across the resultant pairs $(s,y)$. We then prove that this bound is achievable via an \emph{$X$-invariant} randomization $p(y|x) = R(y)$ for $x,y\in\Xepsc$. However, the utility measured by the mutual information $I(X;Y)$ is severely damaged in imposing a strict upper bound $\eps$ on the abs-log-lift. To remedy this and inspired by the probabilistic ($\eps$, $\delta$)-DP, we propose a relaxed ($\eps$, $\delta$)-log-lift framework. To achieve this relaxation, we introduce a greedy algorithm which exempts some elements in $\Xepsc$ from randomization, as long as their abs-log-lift is bounded by $\eps$ with probability $1-\delta$. Numerical results demonstrate efficacy of this algorithm in achieving a better privacy-utility tradeoff.

Citations (13)

View on Semantic Scholar