Papers
Topics
Authors
Recent
Search
2000 character limit reached

NAP^2: A Benchmark for Naturalness and Privacy-Preserving Text Rewriting by Learning from Human

Published 6 Jun 2024 in cs.CL | (2406.03749v2)

Abstract: The widespread use of cloud-based LLMs has heightened concerns over user privacy, as sensitive information may be inadvertently exposed during interactions with these services. To protect privacy before sending sensitive data to those models, we suggest sanitizing sensitive text using two common strategies used by humans: i) deleting sensitive expressions, and ii) obscuring sensitive details by abstracting them. To explore the issues and develop a tool for text rewriting, we curate the first corpus, coined NAP2, through both crowdsourcing and the use of LLMs. Compared to the prior works on anonymization, the human-inspired approaches result in more natural rewrites and offer an improved balance between privacy protection and data utility, as demonstrated by our extensive experiments. Researchers interested in accessing the dataset are encouraged to contact the first or corresponding author via email.

Summary

Paper to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 1 tweet with 1 like about this paper.