Can LLMs Keep a Secret? Testing Privacy Implications of Language Models via Contextual Integrity Theory (2310.17884v2)

Published 27 Oct 2023 in cs.AI, cs.CL, and cs.CR

Abstract: The interactive use of LLMs in AI assistants (at work, home, etc.) introduces a new set of inference-time privacy risks: LLMs are fed different types of information from multiple sources in their inputs and are expected to reason about what to share in their outputs, for what purpose and with whom, within a given context. In this work, we draw attention to the highly critical yet overlooked notion of contextual privacy by proposing ConfAIde, a benchmark designed to identify critical weaknesses in the privacy reasoning capabilities of instruction-tuned LLMs. Our experiments show that even the most capable models such as GPT-4 and ChatGPT reveal private information in contexts that humans would not, 39% and 57% of the time, respectively. This leakage persists even when we employ privacy-inducing prompts or chain-of-thought reasoning. Our work underscores the immediate need to explore novel inference-time privacy-preserving approaches, based on reasoning and theory of mind.

Citations (55)

View on Semantic Scholar

Summary

The paper introduces ConfAIde, a structured benchmark that evaluates LLMs' understanding of information sensitivity via contextual integrity theory.
Experiments show that models such as GPT-4 and ChatGPT leak private information in 22% to 39% of complex social scenarios.
The study underscores that conventional tuning methods are insufficient, suggesting the need for symbolic reasoning and enhanced privacy-preserving mechanisms.

Exploring Contextual Privacy in LLMs

The paper "Can LLMs Keep a Secret? Testing Privacy Implications of LLMs via Contextual Integrity Theory" explores a critical concern in the deployment of LLMs, particularly focusing on the privacy risks associated with inference-time interactions. The authors introduce ConfAIde, a benchmark designed to elucidate the privacy reasoning deficits in instruction-tuned LLMs such as GPT-4 and ChatGPT. They ground their investigation in Helen Nissembaum's contextual integrity theory, which emphasizes the importance of social context in assessing privacy norms.

Key Contributions

The paper provides a structured approach to examining the privacy reasoning capabilities of LLMs through a multi-tiered benchmark:

Tier 1: Info-Sensitivity
- Evaluates models' basic understanding of the sensitivity of various information types without any context.
- LLMs generally exhibit higher conservativeness in labeling information as sensitive when compared to human annotators.
Tier 2: InfoFlow-Expectation
- Assesses whether models can evaluate the appropriateness of specific information flows within given contexts.
- Two sub-tiers are used: simple vignette-based scenarios (Tier 2.a) and more nuanced narrative contexts (Tier 2.b).
- The correlation between models and humans is moderate, but decreases with contextual complexity.
Tier 3: InfoFlow-Control
- Probes the ability of LLMs to control private information flow in multi-party interactions, necessitating social reasoning and theory of mind.
- Results indicate significant privacy leakage, particularly with more complex social incentives.
Tier 4: InfoFlow-Application
- Tests real-world application scenarios, such as automatic meeting summarization, where both privacy preservation and utility are at stake.
- Models often fail to differentiate between public and private information, leading to privacy breaches.

Numerical Findings

Key numerical findings show that GPT-4 and ChatGPT reveal private information in nuanced scenarios with alarming frequency (e.g., 22% in Tier 3 and 39% in Tier 4). These results are consistent even when privacy-inducing prompts are used, indicating a fundamental gap in LLM's privacy reasoning capabilities.

Practical Implications and Future Directions

The implications of these findings are pressing for the deployment of LLMs in any context where privacy is paramount. The paper suggests that surface-level techniques, like instruction tuning and chain-of-thought reasoning, are insufficient to curb privacy leaks. Instead, the results point towards the need for fundamental solutions, possibly incorporating symbolic reasoning that can explicitly track the mental states and information accessibilities of different agents.

From a theoretical perspective, the paper advocates for incorporating insights from areas such as theory of mind and human social reasoning into LLM development. Future research could explore structural model modifications or the introduction of privacy-preserving mechanisms that dynamically adapt to context.

In conclusion, this research accentuates a pivotal issue in AI deployment, urging a paradigm shift towards a more nuanced understanding of contextual privacy, beyond the limitations of current differential privacy techniques. As LLMs continue to permeate intimate and interactive domains, addressing these challenges is critical to maintaining user trust and confidentiality.

PDF Markdown

Related Papers

GitHub

Tweets

https://twitter.com/niloofar_mire/status/1747302514276601954

https://twitter.com/niloofar_mire/status/1836912807167807493

https://twitter.com/niloofar_mire/status/1875318873626701915

https://twitter.com/_vztu/status/1808537931231211678

YouTube

Show All Videos