Revisiting k-NN for Fine-tuning Pre-trained Language Models (2304.09058v2)
Abstract: Pre-trained LLMs (PLMs), as parametric-based eager learners, have become the de-facto choice for current paradigms of NLP. In contrast, k-Nearest-Neighbor (kNN) classifiers, as the lazy learning paradigm, tend to mitigate over-fitting and isolated noise. In this paper, we revisit kNN classifiers for augmenting the PLMs-based classifiers. From the methodological level, we propose to adopt kNN with textual representations of PLMs in two steps: (1) Utilize kNN as prior knowledge to calibrate the training process. (2) Linearly interpolate the probability distribution predicted by kNN with that of the PLMs' classifier. At the heart of our approach is the implementation of kNN-calibrated training, which treats predicted results as indicators for easy versus hard examples during the training process. From the perspective of the diversity of application scenarios, we conduct extensive experiments on fine-tuning, prompt-tuning paradigms and zero-shot, few-shot and fully-supervised settings, respectively, across eight diverse end-tasks. We hope our exploration will encourage the community to revisit the power of classical methods for efficient NLP. Code and datasets are available in https://github.com/zjunlp/Revisit-KNN.
- 2022. Neuro-symbolic language modeling with automaton-augmented retrieval.
- 2020. TACRED revisited: A thorough evaluation of the TACRED relation extraction task. In Proceedings of ACL 2020.
- 2008. In defense of nearest-neighbor based image classification. pages 1–8. IEEE.
- 2020. Language models are few-shot learners. In Proceedings of NeurIPS 2020.
- 2021. Knowprompt: Knowledge-aware prompt-tuning with synergistic optimization for relation extraction. CoRR, abs/2104.07650.
- 2019. BoolQ: Exploring the surprising difficulty of natural yes/no questions. In Proceedings of NAACL-HLT.
- 2019. The commitmentbank: Investigating projection in naturally occurring discourse. In Proceedings of Sinn und Bedeutung.
- 2019. BERT: pre-training of deep bidirectional transformers for language understanding. In Jill Burstein, Christy Doran, and Thamar Solorio, editors, Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pages 4171–4186. Association for Computational Linguistics.
- Jerome H Friedman. 2017. The elements of statistical learning: Data mining, inference, and prediction. springer open.
- 2021. Making pre-trained language models better few-shot learners. In Proceedings of ACL.
- 2018. Search engine guided neural machine translation. In Sheila A. McIlraith and Kilian Q. Weinberger, editors, Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, (AAAI-18), the 30th innovative Applications of Artificial Intelligence (IAAI-18), and the 8th AAAI Symposium on Educational Advances in Artificial Intelligence (EAAI-18), New Orleans, Louisiana, USA, February 2-7, 2018, pages 5133–5140. AAAI Press.
- 2021. Efficient nearest neighbor language models. In Proc. of EMNLP.
- 2010. SemEval-2010 task 8: Multi-way classification of semantic relations between pairs of nominals. In Proceedings of SemEval, pages 33–38.
- 2020. Bert-knn: Adding a knn search component to pretrained language models for better QA. In Findings of EMNLP.
- 2020. Generalization through memorization: Nearest neighbor language models. In 8th International Conference on Learning Representations, ICLR 2020, Addis Ababa, Ethiopia, April 26-30, 2020. OpenReview.net.
- 2021. Nearest neighbor machine translation. In 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net.
- 2019. Text classification algorithms: A survey. Information, 10(4):150.
- 2021. KNN-BERT: fine-tuning pre-trained models with KNN classifier. CoRR, abs/2110.02523.
- 2018. Focal loss for dense object detection.
- 2019. Roberta: A robustly optimized BERT pretraining approach. CoRR, abs/1907.11692.
- 2021. GNN-LM: language modeling based on global contexts via GNN. CoRR, abs/2110.08743.
- 2020. Reasoning with latent structure refinement for document-level relation extraction. In Proceedings of ACL.
- Emin Orhan. 2018. A simple cache model for image recognition. 31:10107–10116.
- 2018. Improving language understanding by generative pre-training. OpenAI.
- 2020. Exploring the limits of transfer learning with a unified text-to-text transformer. J. Mach. Learn. Res., 21:140:1–140:67.
- 2016. Squad: 100, 000+ questions for machine comprehension of text. In Jian Su, Xavier Carreras, and Kevin Duh, editors, Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, EMNLP 2016, Austin, Texas, USA, November 1-4, 2016, pages 2383–2392. The Association for Computational Linguistics.
- 2022. Nearest neighbor zero-shot inference. CoRR, abs/2205.13792.
- 2013. Recursive deep models for semantic compositionality over a sentiment treebank. In Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, Grand Hyatt Seattle, Seattle, Washington, USA, A meeting of SIGDAT, a Special Interest Group of the ACL, pages 1631–1642. ACL.
- 2000. Building a question answering test collection. In the 23rd annual international ACM SIGIR conference on Research and development in information retrieval.
- 2019. Simpleshot: Revisiting nearest-neighbor classification for few-shot learning. arXiv preprint arXiv:1911.04623.
- 2018. A broad-coverage challenge corpus for sentence understanding through inference. In Marilyn A. Walker, Heng Ji, and Amanda Stent, editors, Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2018, New Orleans, Louisiana, USA, June 1-6, 2018, Volume 1 (Long Papers), pages 1112–1122. Association for Computational Linguistics.
- 2022. Reasoning through memorization: Nearest neighbor knowledge graph embeddings. CoRR, abs/2201.05575.