Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

WARDEN: Multi-Directional Backdoor Watermarks for Embedding-as-a-Service Copyright Protection (2403.01472v2)

Published 3 Mar 2024 in cs.CR, cs.CL, and cs.LG

Abstract: Embedding as a Service (EaaS) has become a widely adopted solution, which offers feature extraction capabilities for addressing various downstream tasks in NLP. Prior studies have shown that EaaS can be prone to model extraction attacks; nevertheless, this concern could be mitigated by adding backdoor watermarks to the text embeddings and subsequently verifying the attack models post-publication. Through the analysis of the recent watermarking strategy for EaaS, EmbMarker, we design a novel CSE (Clustering, Selection, Elimination) attack that removes the backdoor watermark while maintaining the high utility of embeddings, indicating that the previous watermarking approach can be breached. In response to this new threat, we propose a new protocol to make the removal of watermarks more challenging by incorporating multiple possible watermark directions. Our defense approach, WARDEN, notably increases the stealthiness of watermarks and has been empirically shown to be effective against CSE attack.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (44)
  1. Generating natural language adversarial examples. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 2890–2896, Brussels, Belgium. Association for Computational Linguistics.
  2. k-means++: The advantages of careful seeding. In Soda, volume 7, pages 1027–1035.
  3. Vance W. Berger and YanYan Zhou. 2014. Kolmogorov–Smirnov Test: Overview. John Wiley & Sons, Ltd.
  4. Language models are few-shot learners. In Advances in Neural Information Processing Systems, volume 33, pages 1877–1901. Curran Associates, Inc.
  5. Exploring connections between active learning and model extraction. In Proceedings of the 29th USENIX Conference on Security Symposium, SEC’20, USA. USENIX Association.
  6. Badpre: Task-agnostic backdoor attacks to pre-trained NLP foundation models. In International Conference on Learning Representations.
  7. A backdoor attack against lstm-based text classification systems. IEEE Access, 7:138872–138878.
  8. BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 4171–4186, Minneapolis, Minnesota. Association for Computational Linguistics.
  9. HotFlip: White-box adversarial examples for text classification. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pages 31–36, Melbourne, Australia. Association for Computational Linguistics.
  10. Singular value decomposition and least squares solutions. Numer. Math., 14(5):403–420.
  11. Extracted BERT model leaks more information than you think! In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 1530–1537, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics.
  12. Model extraction and adversarial transferability, your BERT is vulnerable! In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 2006–2012, Online. Association for Computational Linguistics.
  13. Protecting intellectual property of language generation apis with lexical watermark. Proceedings of the AAAI Conference on Artificial Intelligence, 36(10):10758–10766.
  14. CATER: Intellectual property protection on text generation APIs via conditional watermarks. In Advances in Neural Information Processing Systems.
  15. Prada: protecting against dnn model stealing attacks. In 2019 IEEE European Symposium on Security and Privacy (EuroS&P), pages 512–527. IEEE.
  16. Thieves on sesame street! model extraction of bert-based apis. In 8th International Conference on Learning Representations, ICLR 2020, Addis Ababa, Ethiopia, April 26-30, 2020. OpenReview.net.
  17. Untargeted backdoor watermark: Towards harmless and stealthy dataset copyright protection. In Advances in Neural Information Processing Systems.
  18. Protect, show, attend and tell: Empowering image captioning models with ownership protection. Pattern Recognition, 122:108285.
  19. Stolenencoder: Stealing pre-trained encoders in self-supervised learning. In CCS 2022 - Proceedings of the 2022 ACM SIGSAC Conference on Computer and Communications Security, Proceedings of the ACM Conference on Computer and Communications Security, pages 2115–2128. Association for Computing Machinery.
  20. Ilya Loshchilov and Frank Hutter. 2019. Decoupled weight decay regularization. In International Conference on Learning Representations.
  21. Pointer sentinel mixture models. In ICLR.
  22. Spam filtering with naive bayes-which naive bayes? In CEAS, volume 17, pages 28–69. Mountain View, CA.
  23. OpenAI. 2024. New embedding models and API updates — openai.com. https://openai.com/blog/new-embedding-models-and-api-updates. [Accessed 02-02-2024].
  24. Knockoff nets: Stealing functionality of black-box models. In 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 4949–4958.
  25. Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 12:2825–2830.
  26. Are you copying my model? protecting the copyright of large language models for EaaS via backdoor watermark. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 7653–7668, Toronto, Canada. Association for Computational Linguistics.
  27. Geometric algebra with applications in engineering, volume 4. Springer.
  28. Language models are unsupervised multitask learners. OpenAI blog, 1(8):9.
  29. Nils Reimers and Iryna Gurevych. 2019. Sentence-BERT: Sentence embeddings using Siamese BERT-networks. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 3982–3992, Hong Kong, China. Association for Computational Linguistics.
  30. Douglas A Reynolds et al. 2009. Gaussian mixture models. Encyclopedia of biometrics, 741(659-663).
  31. Recursive deep models for semantic compositionality over a sentiment treebank. In Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, pages 1631–1642, Seattle, Washington, USA. Association for Computational Linguistics.
  32. Watermarking vision-language pre-trained models for multi-modal embedding as a service.
  33. Stealing machine learning models via prediction apis. In Proceedings of the 25th USENIX Conference on Security Symposium, SEC’16, page 601–618, USA. USENIX Association.
  34. Lloyd N. Trefethen and David Bau. 1997. Numerical Linear Algebra. SIAM.
  35. Embedding watermarks into deep neural networks. In Proceedings of the 2017 ACM on International Conference on Multimedia Retrieval, ICMR ’17, page 269–277, New York, NY, USA. Association for Computing Machinery.
  36. Laurens Van der Maaten and Geoffrey Hinton. 2008. Visualizing data using t-sne. Journal of machine learning research, 9(11).
  37. Imitation attacks and defenses for black-box machine translation systems. In Conference on Empirical Methods in Natural Language Processing.
  38. Transformers: State-of-the-art natural language processing. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, pages 38–45, Online. Association for Computational Linguistics.
  39. MIND: A large-scale dataset for news recommendation. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 3597–3606, Online. Association for Computational Linguistics.
  40. Qiongkai Xu and Xuanli He. 2023. Security challenges in natural language processing models. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing: Tutorial Abstracts, pages 7–12, Singapore. Association for Computational Linguistics.
  41. Student surpasses teacher: Imitation attack for black-box NLP APIs. In Proceedings of the 29th International Conference on Computational Linguistics, pages 2849–2860, Gyeongju, Republic of Korea. International Committee on Computational Linguistics.
  42. Black-box attacks on sequential recommenders via data-free model extraction. In Proceedings of the 15th ACM Conference on Recommender Systems, RecSys ’21, page 44–54, New York, NY, USA. Association for Computing Machinery.
  43. Character-level convolutional networks for text classification. In Proceedings of the 28th International Conference on Neural Information Processing Systems - Volume 1, NIPS’15, page 649–657, Cambridge, MA, USA. MIT Press.
  44. Red alarm for pre-trained models: Universal vulnerability to neuron-level backdoor attacks. Machine Intelligence Research, 20(2):180–193.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Anudeex Shetty (3 papers)
  2. Yue Teng (2 papers)
  3. Ke He (123 papers)
  4. Qiongkai Xu (33 papers)
Citations (4)

Summary

We haven't generated a summary for this paper yet.