Papers
Topics
Authors
Recent
Detailed Answer
Quick Answer
Concise responses based on abstracts only
Detailed Answer
Well-researched responses based on abstracts and relevant paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses
Gemini 2.5 Flash
Gemini 2.5 Flash 37 tok/s
Gemini 2.5 Pro 41 tok/s Pro
GPT-5 Medium 10 tok/s Pro
GPT-5 High 15 tok/s Pro
GPT-4o 84 tok/s Pro
Kimi K2 198 tok/s Pro
GPT OSS 120B 448 tok/s Pro
Claude Sonnet 4 31 tok/s Pro
2000 character limit reached

Where exactly does contextualization in a PLM happen? (2312.06514v1)

Published 11 Dec 2023 in cs.CL and cs.AI

Abstract: Pre-trained LLMs (PLMs) have shown to be consistently successful in a plethora of NLP tasks due to their ability to learn contextualized representations of words (Ethayarajh, 2019). BERT (Devlin et al., 2018), ELMo (Peters et al., 2018) and other PLMs encode word meaning via textual context, as opposed to static word embeddings, which encode all meanings of a word in a single vector representation. In this work, we present a study that aims to localize where exactly in a PLM word contextualization happens. In order to find the location of this word meaning transformation, we investigate representations of polysemous words in the basic BERT uncased 12 layer architecture (Devlin et al., 2018), a masked LLM trained on an additional sentence adjacency objective, using qualitative and quantitative measures.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (12)
  1. Blackbox meets blackbox: Representational similarity & stability analysis of neural language models and brains. In Proceedings of the 2019 ACL Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP, pages 191–203, Florence, Italy. Association for Computational Linguistics.
  2. What does BERT look at? an analysis of BERT’s attention. In Proceedings of the 2019 ACL Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP, pages 276–286, Florence, Italy. Association for Computational Linguistics.
  3. BERT: pre-training of deep bidirectional transformers for language understanding. CoRR, abs/1810.04805.
  4. Kawin Ethayarajh. 2019. How contextual are contextualized word representations? comparing the geometry of bert, elmo, and GPT-2 embeddings. CoRR, abs/1909.00512.
  5. Janosch Haber and Massimo Poesio. 2020. Word sense distance in human similarity judgements and contextualised word embeddings. In Proceedings of the Probability and Meaning Conference (PaM 2020), pages 128–145, Gothenburg. Association for Computational Linguistics.
  6. Omar Khattab and Matei Zaharia. 2020. Colbert: Efficient and effective passage search via contextualized late interaction over bert. In Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ’20, page 39–48, New York, NY, USA. Association for Computing Machinery.
  7. David Mareček and Rudolf Rosa. 2019. From balustrades to pierre vinken: Looking for syntax in transformer self-attentions. In Proceedings of the 2019 ACL Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP, pages 263–275, Florence, Italy. Association for Computational Linguistics.
  8. Deep contextualized word representations. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), pages 2227–2237, New Orleans, Louisiana. Association for Computational Linguistics.
  9. Word sense disambiguation: A unified evaluation framework and empirical comparison. In Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers, pages 99–110, Valencia, Spain. Association for Computational Linguistics.
  10. Unsupervised distillation of syntactic information from contextualized word representations. In Proceedings of the Third BlackboxNLP Workshop on Analyzing and Interpreting Neural Networks for NLP, pages 91–106, Online. Association for Computational Linguistics.
  11. Does bert make any sense? interpretable word sense disambiguation with contextualized embeddings.
  12. Quantifying the contextualization of word representations with semantic class probing.
Citations (1)

Summary

We haven't generated a summary for this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

Lightbulb On Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.