Shortcut Learning of Large Language Models in Natural Language Understanding (2208.11857v2)

Published 25 Aug 2022 in cs.CL and cs.LG

Abstract: LLMs have achieved state-of-the-art performance on a series of natural language understanding tasks. However, these LLMs might rely on dataset bias and artifacts as shortcuts for prediction. This has significantly affected their generalizability and adversarial robustness. In this paper, we provide a review of recent developments that address the shortcut learning and robustness challenge of LLMs. We first introduce the concepts of shortcut learning of LLMs. We then introduce methods to identify shortcut learning behavior in LLMs, characterize the reasons for shortcut learning, as well as introduce mitigation solutions. Finally, we discuss key research challenges and potential research directions in order to advance the field of LLMs.

Citations (68)

View on Semantic Scholar

Summary

The paper reveals that LLMs often rely on superficial patterns in training data, which undermines out-of-distribution performance.
The paper presents comprehensive detection strategies including OOD testing and explainability analyses via feature attribution.
The paper proposes mitigation techniques such as data refurbishment, reweighting, and contrastive learning to curb reliance on biased features.

Summary of Shortcut Learning Phenomena

Shortcut learning in LLMs is a critical issue impeding robustness and generalizational capabilities of such models in natural language understanding (NLU) tasks. The phenomenon occurs when models utilize superficial correlations in the training data, effectively treating artifacts and biases as a path of least resistance for making predictions. This behavior is detrimental to out-of-distribution (OOD) performance and exposes a model to adversarial attacks.

Shortcut Learning Detection

Methods developed to identify shortcut learning include:

Comprehensive performance testing, which involves assessments beyond in-distribution tests, incorporating OOD generalization, and adversarial robustness checks.
Explainability analysis, employing techniques like feature attribution, which reveals dependencies on biased features. Such diagnostics serve as a litmus test for models' reliance on non-substantive features that might predict labels correctly within the training data but fail to uphold generativity in diverse real-world scenarios.

Origins of Shortcut Learning

The causes of shortcut learning are multifaceted, residing within the skewed training process, LLM architecture, and fine-tuning. Training datasets with inherent biases train LLMs to amplify these during inference. Moreover, variations in the robustness of LLMs are observed depending on the model size and the specific pre-training objectives undertaken. The dynamics of model fine-tuning also lend themselves to a preference for simple, easy-to-learn features early in the training process, often obstructing the learning of more robust features.

Mitigation of Shortcut Learning

Countermeasures against shortcut learning involve data-centric approaches, such as data refurbishment and sample reweighting, alongside model-centric strategies that infuse additional prior knowledge to suppress the learning of non-robust features. Emerging methods also introduce regularizing confidence and utilizing contrastive learning to pivot models away from non-robust features maintained in training data. Notably, the question of whether there exists a trade-off between IID performance and OOD robustness warrants further research to optimize a model's overall efficacy and reliability.

Future Research Directions

Continued advancement in addressing shortcut learning should focus on integrating domain knowledge to enrich training, curating more challenging datasets, and further refining mitigation approaches for enhanced performance. There is a particular need for a robust theoretical framework that dissects the drivers behind shortcut learning in deep LLMs. Taking inspiration from related fields, such as domain adaptation and long-tailed classification, may also yield novel strategies for improving the robustness of LLMs in NLU tasks. Additionally, the exploration of the robustness of emerging prompt-based LLM systems is of particular interest, as these models increasingly veer from the standard training practices.

The survey also compels the community to re-examine the current practices of predominantly data-driven paradigms and to motivate the pursuit of interdisciplinary approaches that leverage collective insights from diverse computational intelligence sectors. Ultimately, a holistic approach combining data, modeling, and evaluative intricacies is indispensable to mitigate the propensity for shortcut learning and propel LLMs toward truly robust natural language understanding.

PDF Markdown