Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
158 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Can Prompt Probe Pretrained Language Models? Understanding the Invisible Risks from a Causal View (2203.12258v1)

Published 23 Mar 2022 in cs.CL

Abstract: Prompt-based probing has been widely used in evaluating the abilities of pretrained LLMs (PLMs). Unfortunately, recent studies have discovered such an evaluation may be inaccurate, inconsistent and unreliable. Furthermore, the lack of understanding its inner workings, combined with its wide applicability, has the potential to lead to unforeseen risks for evaluating and applying PLMs in real-world applications. To discover, understand and quantify the risks, this paper investigates the prompt-based probing from a causal view, highlights three critical biases which could induce biased results and conclusions, and proposes to conduct debiasing via causal intervention. This paper provides valuable insights for the design of unbiased datasets, better probing frameworks and more reliable evaluations of pretrained LLMs. Furthermore, our conclusions also echo that we need to rethink the criteria for identifying better pretrained LLMs. We openly released the source code and data at https://github.com/c-box/causalEval.

Citations (36)

Summary

We haven't generated a summary for this paper yet.