Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
158 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Self-Supervised Position Debiasing for Large Language Models (2401.01218v3)

Published 2 Jan 2024 in cs.CL, cs.AI, and cs.LG

Abstract: Fine-tuning has been demonstrated to be an effective method to improve the domain performance of LLMs. However, LLMs might fit the dataset bias and shortcuts for prediction, leading to poor generation performance. Previous works have proven that LLMs are prone to exhibit position bias, i.e., leveraging information positioned at the beginning or end, or specific positional cues within the input. Existing debiasing methods for LLMs require external bias knowledge or annotated non-biased samples, which is lacking for position debiasing and impractical in reality. In this work, we propose a self-supervised position debiasing (SOD) framework to mitigate position bias for LLMs. SOD leverages unsupervised responses from pre-trained LLMs for debiasing without relying on any external knowledge. To improve the quality of unsupervised responses, we propose an objective alignment (OAM) module to prune these responses. Experiments on eight datasets and five tasks show that SOD consistently outperforms existing methods in mitigating three types of position biases. Besides, SOD achieves this by sacrificing only a small performance on biased samples, which is general and effective. To facilitate the reproducibility of the results, we share the code of all methods and datasets on https://github.com/LZKSKY/SOD.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (52)
  1. A large annotated corpus for learning natural language inference. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, pages 632–642.
  2. CoQAR: Question rewriting on CoQA. In Proceedings of the Thirteenth Language Resources and Evaluation Conference, LREC 2022, pages 119–126.
  3. Evaluating question answering evaluation. In Proceedings of the 2nd workshop on machine reading for question answering, pages 119–124.
  4. Quac: Question answering in context. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, EMNLP 2018, pages 2174–2184.
  5. C2l: Causally contrastive learning for robust text classification. In Proceedings of the AAAI Conference on Artificial Intelligence, AAAI 2022, pages 10526–10534.
  6. Scaling instruction-finetuned language models. arXiv preprint arXiv:2210.11416.
  7. MuTual: A dataset for multi-turn dialogue reasoning. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, ACL 2020, pages 1406–1416.
  8. Compression, transduction, and creation: A unified framework for evaluating natural language generation. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, EMNLP 2021, pages 7580–7605.
  9. Towards faithful dialogues via focus learning. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), ACL 2023, pages 4554–4566.
  10. Parameter-efficient fine-tuning of large-scale pre-trained language models. Nature Machine Intelligence, 5(3):220–235.
  11. Modeling what-to-ask and how-to-ask for answer-unaware conversational question generation. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), ACL 2023, pages 10785–10803.
  12. Shortcut learning of large language models in natural language understanding: A survey. arXiv preprint arXiv:2208.11857.
  13. Towards interpreting and mitigating shortcut learning behavior of NLU models. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL 2021, pages 915–929.
  14. Can you unpack that? learning to rewrite questions-in-context. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, EMNLP-IJCNLP 2019, pages 5917–5923.
  15. doc2dial: A goal-oriented document-grounded dialogue dataset. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020, pages 8118–8128.
  16. On the robustness of dialogue history representation in conversational question answering: A comprehensive study and a new prompt-based method. Transactions of the Association for Computational Linguistics, TACL 2023, 11:351–366".
  17. Countering the effects of lead bias in news summarization via multi-stage training and auxiliary losses. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, EMNLP-IJCNLP 2019, pages 6019–6024.
  18. Newsroom: A dataset of 1.3 million summaries with diverse extractive strategies. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), NAACL 2018, pages 708–719.
  19. Auto-debias: Debiasing masked language models with automated biased prompts. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), ACL 2022, pages 1012–1023.
  20. Annotation artifacts in natural language inference data. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers), NAACL 2018, pages 107–112.
  21. End-to-end bias mitigation by modelling biases in corpora. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, ACL 2020, pages 8706–8716.
  22. Chatgpt for good? on opportunities and challenges of large language models for education. Learning and individual differences, 103:102274.
  23. Content selection in deep learning models of summarization. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, EMNLP 2018, pages 1818–1828.
  24. Diederik P. Kingma and Jimmy Ba. 2015. Adam: A method for stochastic optimization. In Proceedings of the 3rd International Conference on Learning Representations, ICLR 2015.
  25. Look at the first sentence: Position bias in question answering. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020, pages 1109–1121.
  26. Large language models are zero-shot reasoners. Advances in neural information processing systems, NeurIPS 2022, 35:22199–22213.
  27. Michalis Korakakis and Andreas Vlachos. 2023. Improving the robustness of nli models with minimax training. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), ACL 2023, pages 14322–14339.
  28. Prompt tuning pushes farther, contrastive learning pulls closer: A two-stage approach to mitigate social biases. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), ACL 2023, pages 14254–14267.
  29. Chin-Yew Lin. 2004. Rouge: A package for automatic evaluation of summaries. In Text summarization branches out, pages 74–81.
  30. Feifan Liu and Yang Liu. 2008. Correlation between rouge and human evaluation of extractive meeting summaries. In Proceedings of ACL-08: HLT, short papers, pages 201–204.
  31. Lost in the middle: How language models use long contexts. arXiv preprint arXiv:2307.03172.
  32. Exploiting position bias for robust aspect sentiment classification. In Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021, pages 1352–1358.
  33. Right for the wrong reasons: Diagnosing syntactic heuristics in natural language inference. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, ACL 2019, pages 3428–3448.
  34. Using in-context learning to improve dialogue safety. arXiv preprint arXiv:2302.00871.
  35. An empirical survey of the effectiveness of debiasing techniques for pre-trained language models. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), ACL 2022, pages 1878–1898.
  36. Dukenet: A dual knowledge interaction network for knowledge-grounded conversation. In Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2020, pages 1151–1160.
  37. Abstractive text summarization using sequence-to-sequence RNNs and beyond. In Proceedings of the 20th SIGNLL Conference on Computational Natural Language Learning, SIGNLL 2016, pages 280–290.
  38. Hypothesis only baselines in natural language inference. In Proceedings of the Seventh Joint Conference on Lexical and Computational Semantics, *SEM 2018, pages 180–191.
  39. Coqa: A conversational question answering challenge. Transactions of the Association for Computational Linguistics, TACL 2019, 7:249–266.
  40. Self-diagnosis and self-debiasing: A proposal for reducing corpus-based bias in nlp. Transactions of the Association for Computational Linguistics, TACL 2021, 9:1408–1424.
  41. Get to the point: Summarization with pointer-generator networks. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), ACL 2017, pages 1073–1083.
  42. Look to the right: Mitigating relative position bias in extractive question answering. In Proceedings of the Fifth BlackboxNLP Workshop on Analyzing and Interpreting Neural Networks for NLP, BlackboxNLP 2022, pages 418–425.
  43. What makes reading comprehension questions easier? In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, EMNLP 2018, pages 4208–4219.
  44. Capturing greater context for question generation. In Proceedings of the AAAI conference on artificial intelligence, AAAI 2020, pages 9065–9072.
  45. Avoiding inference heuristics in few-shot prompt-based finetuning. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, EMNLP 2021, pages 9063–9074.
  46. Daniel Varab and Yumo Xu. 2023. Abstractive summarizers are excellent extractive summarizers. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), ACL 2023, pages 330–339.
  47. GLUE: A multi-task benchmark and analysis platform for natural language understanding. In Proceedings of the 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP, EMNLP 2018, pages 353–355.
  48. Making neural QA as simple as possible but not simpler. In Proceedings of the 21st Conference on Computational Natural Language Learning, CoNLL 2017, pages 271–280.
  49. Demoting the lead bias in news summarization via alternating adversarial learning. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 2: Short Papers), ACL-IJCNLP 2021, pages 948–954.
  50. Adept: A debiasing prompt framework. In Proceedings of the AAAI Conference on Artificial Intelligence, AAAI 2023, pages 10780–10788.
  51. Leveraging lead bias for zero-shot abstractive news summarization. In Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2021, pages 1462–1471.
  52. Counterfactual data augmentation for mitigating gender stereotypes in languages with rich morphology. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, ACL 2019, pages 1651–1661.

Summary

We haven't generated a summary for this paper yet.