Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
110 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

De-amplifying Bias from Differential Privacy in Language Model Fine-tuning (2402.04489v1)

Published 7 Feb 2024 in cs.LG, cs.CR, cs.CY, and stat.ME

Abstract: Fairness and privacy are two important values ML practitioners often seek to operationalize in models. Fairness aims to reduce model bias for social/demographic sub-groups. Privacy via differential privacy (DP) mechanisms, on the other hand, limits the impact of any individual's training data on the resulting model. The trade-offs between privacy and fairness goals of trustworthy ML pose a challenge to those wishing to address both. We show that DP amplifies gender, racial, and religious bias when fine-tuning LLMs, producing models more biased than ones fine-tuned without DP. We find the cause of the amplification to be a disparity in convergence of gradients across sub-groups. Through the case of binary gender bias, we demonstrate that Counterfactual Data Augmentation (CDA), a known method for addressing bias, also mitigates bias amplification by DP. As a consequence, DP and CDA together can be used to fine-tune models while maintaining both fairness and privacy.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (44)
  1. Deep Learning with Differential Privacy. In Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security. ACM.
  2. Persistent Anti-Muslim Bias in Large Language Models. CoRR, abs/2101.05783.
  3. Differential Privacy Has Disparate Impact on Model Accuracy. CoRR, abs/1905.12101.
  4. A Drop of Ink Makes a Million Think: The Spread of False Information in Large Language Models. arXiv:2305.04812.
  5. Man is to Computer Programmer as Woman is to Homemaker? Debiasing Word Embeddings. CoRR, abs/1607.06520.
  6. Language Models are Few-Shot Learners. arXiv:2005.14165.
  7. Extracting Training Data from Large Language Models. CoRR, abs/2012.07805.
  8. BOLD: Dataset and Metrics for Measuring Biases in Open-Ended Language Generation. In Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency, FAccT ’21, 862–872. New York, NY, USA: Association for Computing Machinery. ISBN 9781450383097.
  9. Semantics derived automatically from language corpora necessarily contain human biases. CoRR, abs/1608.07187.
  10. Adam: A Method for Stochastic Optimization. arXiv:1412.6980.
  11. Bias Out-of-the-Box: An Empirical Analysis of Intersectional Occupational Biases in Popular Generative Language Models. arXiv:2102.04130.
  12. Fair Decision Making using Privacy-Protected Data. arXiv:1905.12744.
  13. When Does Differentially Private Learning Not Suffer in High Dimensions? In Oh, A. H.; Agarwal, A.; Belgrave, D.; and Cho, K., eds., Advances in Neural Information Processing Systems.
  14. Large Language Models Can Be Strong Differentially Private Learners. In International Conference on Learning Representations.
  15. Li, Z. 2023. The Dark Side of ChatGPT: Legal and Ethical Challenges from Stochastic Parrots and Hallucination. arXiv:2304.14347.
  16. Holistic Evaluation of Language Models. arXiv:2211.09110.
  17. Towards Understanding and Mitigating Social Biases in Language Models. CoRR, abs/2106.13219.
  18. Gender Bias in Neural Natural Language Processing. CoRR, abs/1807.11714.
  19. On Measuring Social Biases in Sentence Encoders. arXiv:1903.10561.
  20. Pointer Sentinel Mixture Models. arXiv:1609.07843.
  21. StereoSet: Measuring stereotypical bias in pretrained language models. arXiv preprint arXiv:2004.09456.
  22. Nationality Bias in Text Generation. In Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics, 116–122. Dubrovnik, Croatia: Association for Computational Linguistics.
  23. ”HONEST: Measuring Hurtful Sentence Completion in Language Models”. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2398–2406. Online: Association for Computational Linguistics.
  24. OpenAI. 2023. GPT-4 Technical Report. arXiv:2303.08774.
  25. Counterfactual Data Augmentation using Locally Factored Dynamics. arXiv:2007.02863.
  26. Language Models are Unsupervised Multitask Learners.
  27. Counterfactual Data Augmentation improves Factuality of Abstractive Summarization. arXiv:2205.12416.
  28. Rethinking Counterfactual Data Augmentation Under Confounding. arXiv:2305.18183.
  29. DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter. In NeurIPS EMC22{}^{2}start_FLOATSUPERSCRIPT 2 end_FLOATSUPERSCRIPT Workshop.
  30. An Analysis Of Protected Health Information Leakage In Deep-Learning Based De-Identification Algorithms. arXiv:2101.12099.
  31. The Woman Worked as a Babysitter: On Biases in Language Generation. CoRR, abs/1909.01326.
  32. Selective Differential Privacy for Language Modeling. arXiv:2108.12944.
  33. Just Fine-tune Twice: Selective Differential Privacy for Large Language Models. arXiv:2204.07667.
  34. Towards a Comprehensive Understanding and Accurate Evaluation of Societal Biases in Pre-Trained Transformers. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2383–2389. Online: Association for Computational Linguistics.
  35. Mitigating Gender Bias in Natural Language Processing: Literature Review. arXiv:1906.08976.
  36. Data Leakage in Tabular Federated Learning. arXiv:2210.01785.
  37. Overcoming Bias in Pretrained Models by Manipulating the Finetuning Dataset. arXiv:2303.06167.
  38. Ethical and social risks of harm from Language Models. CoRR, abs/2112.04359.
  39. HuggingFace’s Transformers: State-of-the-art Natural Language Processing. arXiv:1910.03771.
  40. BLOOM: A 176B-Parameter Open-Access Multilingual Language Model. arXiv:2211.05100.
  41. Opacus: User-Friendly Differential Privacy Library in PyTorch. arXiv preprint arXiv:2109.12298.
  42. Gender Bias in Coreference Resolution: Evaluation and Debiasing Methods. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers), 15–20. New Orleans, Louisiana: Association for Computational Linguistics.
  43. A Survey of Large Language Models. arXiv:2303.18223.
  44. A Comprehensive Survey on Pretrained Foundation Models: A History from BERT to ChatGPT. arXiv:2302.09419.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Sanjari Srivastava (3 papers)
  2. Piotr Mardziel (18 papers)
  3. Zhikhun Zhang (1 paper)
  4. Archana Ahlawat (2 papers)
  5. Anupam Datta (51 papers)
  6. John C Mitchell (10 papers)
Citations (1)

Summary

We haven't generated a summary for this paper yet.