Papers
Topics
Authors
Recent
2000 character limit reached

Revisiting Word Embeddings in the LLM Era (2402.11094v3)

Published 16 Feb 2024 in cs.CL

Abstract: LLMs have recently shown remarkable advancement in various NLP tasks. As such, a popular trend has emerged lately where NLP researchers extract word/sentence/document embeddings from these large decoder-only models and use them for various inference tasks with promising results. However, it is still unclear whether the performance improvement of LLM-induced embeddings is merely because of scale or whether underlying embeddings they produce significantly differ from classical encoding models like Word2Vec, GloVe, Sentence-BERT (SBERT) or Universal Sentence Encoder (USE). This is the central question we investigate in the paper by systematically comparing classical decontextualized and contextualized word embeddings with the same for LLM-induced embeddings. Our results show that LLMs cluster semantically related words more tightly and perform better on analogy tasks in decontextualized settings. However, in contextualized settings, classical models like SimCSE often outperform LLMs in sentence-level similarity assessment tasks, highlighting their continued relevance for fine-grained semantics.

Citations (10)

Summary

  • The paper presents its main contribution by benchmarking LLM-based embeddings against classical models using cosine similarity and analogy tests.
  • It demonstrates that LLMs like ADA and PaLM achieve superior semantic clustering and word-pair similarity compared to traditional methods.
  • It reveals that SBERT, while resource-efficient, maintains notable agreement with LLMs, highlighting its potential in constrained environments.

Revisiting Word Embeddings in the LLM Era

Introduction

The paper "Revisiting Word Embeddings in the LLM Era" (2402.11094) investigates the efficacy of LLMs as word embedding models, particularly in comparison to classical methods like SBERT and USE. The study critically examines whether the enhanced performance of LLMs is primarily due to their scale or if they inherently produce distinct embeddings. The authors focus on two primary analyses: word-pair similarity and word analogy tasks, using metrics such as cosine similarity to explore embeddings' latent semantic spaces.

Word-Pair Similarity Analysis

One of the principal investigations in the paper revolves around analyzing how LLM-based embeddings contrast with classical ones in terms of word-pair cosine similarity. Using a corpus of approximately 80,000 distinct words from WordNet, the authors compute the cosine similarity of every word pair across different models, totaling around 6.4 billion pairs.

The results, depicted in (Figure 1), demonstrate that LLM-based embeddings, particularly ADA and LLaMA, exhibit higher expected cosine similarity for random word pairs than classical models. This finding underscores significant differences in the latent semantic spacing among models, as LLMs evidently cluster semantically related words more tightly. Figure 1

Figure 1: The distribution of cosine similarities between all pairs of words for each model.

Further, the study utilizes the BATS dataset to compare morphologically and semantically related word pairs. Notably, as demonstrated in (Figure 2), ADA, PaLM, and SBERT distinctly separate related pairs from unrelated ones better than other models, highlighting their superior semantic similarity capture. Figure 2

Figure 2: Histogram showing the distribution of cosine similarities for random, morphologically related, and semantically related pairs of words for each model.

Word Analogy Task Analysis

The paper expands its exploration to the word analogy task, traditionally performed on classical models. It evaluates LLMs using the BATS dataset under several analogy-solving metrics, including 3CosAdd and LRCos, among others.

The results reveal that ADA and PaLM perform significantly better than classical embeddings, as illustrated in the report. Interestingly, SBERT, despite being a classical model, is often ranked as the third-best performer, hinting at its potential as a resource-efficient alternative. These results are encapsulated in (Figure 3), which reflects the ranking discrepancies among various embeddings for certain BATS categories. Figure 3

Figure 3: For each model, the cosine similarity of related words was found and ranked according to all pairs of words. Here, the difference in ranking between model pairs for certain BATS categories is shown.

Correlation and Agreement Among Models

The study further scrutinizes the degree of agreement among different embedding models regarding word pair similarities. The results, portrayed in (Figure 4), indicate that only ADA and PaLM, both LLMs, consistently align in their semantic assessments. Surprisingly, SBERT also shows a notable degree of agreement with these heavy LLMs, demonstrating its versatility and robustness despite being a lighter model. Figure 4

Figure 4

Figure 4: Kendal tau correlation coefficients for each pair of models, found using a large dataset of pairs of words.

Conclusion

The comprehensive investigation in this study establishes that LLM-based word embeddings generally offer significant enhancements in semantic clustering and analogy task performance compared to classical models. ADA and PaLM, in particular, demonstrate strong capabilities in these areas. However, the findings also suggest that SBERT, a classical model, holds its ground well, proposing itself as a viable option for resource-constrained environments due to its agreement with more resource-intensive LLMs. These insights pave the way for future explorations that aim to elucidate the nuanced distinctions and potential integrations of classical and LLM-based word embeddings in NLP applications.

Whiteboard

Paper to Video (Beta)

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 3 tweets with 98 likes about this paper.