- The paper presents a kNN-based method that leverages BERT’s contextualized embeddings for interpretable word sense disambiguation.
- It shows state-of-the-art performance on SensEval-2 and SensEval-3 through effective clustering of polysemous words.
- The research opens avenues for unsupervised sense induction and improved clustering techniques in advanced NLP applications.
Interpretable Word Sense Disambiguation with Contextualized Embeddings
The paper "Does BERT Make Any Sense? Interpretable Word Sense Disambiguation with Contextualized Embeddings" presents a method for word sense disambiguation (WSD) utilizing contextualized word embeddings (CWEs). CWEs from models such as BERT, ELMo, and Flair offer advanced capabilities in providing semantic vector representations of words in context, surpassing static embeddings in capturing polysemy.
Overview and Approach
The authors propose using CWEs directly for WSD by employing a k-nearest neighbor (kNN) classification approach. This method leverages the contextualization aspect of CWEs, which inherently produce distinct vectors for the same token in varying contexts, implicitly modeling for WSD without a predefined sense inventory. By relying on kNN, the model gains interpretability through traceable provenance of training sentences influencing classification decisions.
Results and Comparisons
The paper reports significant improvements in WSD performance using BERT embeddings over other CWE models and establishes new state-of-the-art benchmarks on two standard WSD datasets, SensEval-2 and SensEval-3. The evaluation reveals distinct clustering capabilities of BERT embeddings, whereby polysemic words are grouped into specific sense regions, contrasting with ELMo and Flair, which do not exhibit the same degree of separability.
Implications
The findings underscore the potential of transformer-based models like BERT in encoding meaningful semantic distinctions beyond traditional embeddings. In practice, this may lead to more robust and interpretable NLP systems capable of sophisticated language understanding tasks like WSD across diverse contexts.
Future Directions
The authors suggest exploring unsupervised sense induction through cluster analyses within CWE spaces. They also hint at extending evaluations to newer models like RoBERTa, XLNet, and others that build upon BERT's architecture. Understanding near-miss errors and investigating more complex classifiers might further mitigate issues stemming from sparse training datasets.
In conclusion, this research demonstrates BERT's proficiency in disentangling word senses through contextual embeddings, advancing the methodology for accurate and interpretable word sense disambiguation in NLP.