Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 137 tok/s
Gemini 2.5 Pro 45 tok/s Pro
GPT-5 Medium 26 tok/s Pro
GPT-5 High 24 tok/s Pro
GPT-4o 116 tok/s Pro
Kimi K2 207 tok/s Pro
GPT OSS 120B 430 tok/s Pro
Claude Sonnet 4.5 37 tok/s Pro
2000 character limit reached

BERTnesia: Investigating the capture and forgetting of knowledge in BERT (2106.02902v2)

Published 5 Jun 2021 in cs.CL

Abstract: Probing complex LLMs has recently revealed several insights into linguistic and semantic patterns found in the learned representations. In this article, we probe BERT specifically to understand and measure the relational knowledge it captures in its parametric memory. While probing for linguistic understanding is commonly applied to all layers of BERT as well as fine-tuned models, this has not been done for factual knowledge. We utilize existing knowledge base completion tasks (LAMA) to probe every layer of pre-trained as well as fine-tuned BERT models(ranking, question answering, NER). Our findings show that knowledge is not just contained in BERT's final layers. Intermediate layers contribute a significant amount (17-60%) to the total knowledge found. Probing intermediate layers also reveals how different types of knowledge emerge at varying rates. When BERT is fine-tuned, relational knowledge is forgotten. The extent of forgetting is impacted by the fine-tuning objective and the training data. We found that ranking models forget the least and retain more knowledge in their final layer compared to masked language modeling and question-answering. However, masked language modeling performed the best at acquiring new knowledge from the training data. When it comes to learning facts, we found that capacity and fact density are key factors. We hope this initial work will spur further research into understanding the parametric memory of LLMs and the effect of training objectives on factual knowledge. The code to repeat the experiments is publicly available on GitHub.

Summary

We haven't generated a summary for this paper yet.

Dice Question Streamline Icon: https://streamlinehq.com

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Lightbulb Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.