Emergent Mind

Abstract

Embeddings from LLMs have emerged as critical components in various applications, particularly for information retrieval. While high-dimensional embeddings generally demonstrate superior performance as they contain more salient information, their practical application is frequently hindered by elevated computational latency and the associated higher cost. To address these challenges, we propose Matryoshka-Adaptor, a novel tuning framework designed for the customization of LLM embeddings. Matryoshka-Adaptor facilitates substantial dimensionality reduction while maintaining comparable performance levels, thereby achieving a significant enhancement in computational efficiency and cost-effectiveness. Our framework directly modifies the embeddings from pre-trained LLMs which is designed to be seamlessly integrated with any LLM architecture, encompassing those accessible exclusively through black-box APIs. Also, it exhibits efficacy in both unsupervised and supervised learning settings. A rigorous evaluation conducted across a diverse corpus of English, multilingual, and multimodal datasets consistently reveals substantial gains with Matryoshka-Adaptor. Notably, with Google and OpenAI Embedding APIs, Matryoshka-Adaptor achieves a reduction in dimensionality ranging from two- to twelve-fold without compromising performance across multiple BEIR datasets.

Matryoshka Adaptor reduces dimensions, maintaining nDCG@10 performance in both supervised and unsupervised scenarios.

Overview

  • The Matryoshka-Adaptor framework enables the reduction of embedding dimensions from LLMs while maintaining performance in tasks such as information retrieval (IR), thus tackling the computational cost issues associated with high-dimensional embeddings.

  • It implements both unsupervised and supervised learning techniques, using pairwise and top-k similarity loss functions in the unsupervised setting, and ranking loss functions with labeled data in the supervised setting, to tailor embeddings specifically to the needs of the task.

  • Extensive experiments on datasets like BEIR, MIRACL, and Fashion-200K show that Matryoshka-Adaptor achieves significant dimensionality reduction without performance loss, outperforming traditional techniques like PCA and demonstrating robustness across different languages and modalities.

Matryoshka-Adaptor: Unsupervised and Supervised Tuning for Smaller Embedding Dimensions

In their paper, “Matryoshka-Adaptor: Unsupervised and Supervised Tuning for Smaller Embedding Dimensions,” Jinsung Yoon, Raj Sinha, Sercan Ö. Arık, and Tomas Pfister introduce Matryoshka-Adaptor, a novel framework for tuning Large Language Model (LLM) embeddings. The framework reduces the dimensionality of embeddings while maintaining performance across various tasks, notably in information retrieval (IR). This work is motivated by the challenge of handling the high computational latency and cost associated with high-dimensional embeddings, which often deter their practical application in latency-sensitive systems.

Overview of Matryoshka-Adaptor

The core of the Matryoshka-Adaptor framework lies in its ability to customize embeddings obtained from pre-trained LLMs, whether these embeddings are accessed through models or black-box APIs. It achieves this through both unsupervised and supervised learning scenarios:

  1. Unsupervised Setting: Matryoshka-Adaptor uses pairwise and top-k similarity loss functions to transform original embeddings into lower-dimensional embeddings that retain the salient features of their higher-dimensional counterparts. This setting does not require any labeled data, making it adaptable to a wide range of scenarios where only corpus data is available.
  2. Supervised Setting: Here, the framework leverages labeled (query, corpus) pairs to further refine the embeddings. A ranking loss function is introduced alongside unsupervised similarity losses. This setting aims to enhance information retrieval by tailoring embeddings specifically to the task requirements.

Experimental Validation

Their method was rigorously evaluated across several datasets, including 13 BEIR, 17 MIRACL, and 5 Fashion-200K datasets, covering various languages and multimodal data. The experiments utilized embeddings from state-of-the-art models, including those from Google and OpenAI.

  • Google and OpenAI Embedding APIs: Results demonstrated substantial dimensionality reductions (ranging from two- to twelve-fold) without significant loss in performance across multiple BEIR datasets.
  • Unsupervised Matryoshka-Adaptor: Significant performance improvements were noted, particularly at lower dimensions. The framework outperformed standard dimensionality reduction techniques like PCA.
  • Supervised Matryoshka-Adaptor: When fine-tuning embeddings with supervised data, Matryoshka-Adaptor exhibited considerable performance gains, emphasizing its utility in improving retrieval tasks without increasing latency.

Numerical Results

  • BEIR Datasets: For example, with OpenAI text-embedding-3-large, Matryoshka-Adaptor achieved performance akin to full-dimensional embeddings but at reduced dimensions as low as 64.
  • MIRACL Datasets: Google’s multilingual embeddings, upon being processed with Matryoshka-Adaptor, showed similar gains, demonstrating robustness across languages.
  • Fashion-200K: In multimodal scenarios, the framework preserved the effectiveness of embeddings even when significantly reduced in dimensionality.

Implications and Future Directions

The introduction of Matryoshka-Adaptor presents several practical and theoretical implications. In practice, this framework can be seamlessly integrated into existing IR systems to alleviate the high computational burdens associated with high-dimensional embeddings. The ability to transform embeddings to be more computationally efficient without losing performance can significantly benefit large-scale recommendation systems and other real-time applications.

From a theoretical perspective, Matryoshka-Adaptor provides new insights into representation learning by showing that embeddings can be fine-tuned post hoc to achieve near-optimal performance at lower dimensions. This challenges the traditional understanding that high-dimensional spaces are strictly necessary for achieving superior model performance.

As a future direction, the authors suggest that the framework could be extended to semi-supervised learning or be adapted to support simultaneous tuning across multiple datasets or modalities. This would further enhance its utility and applicability to a broader range of machine learning tasks.

In summary, Matryoshka-Adaptor is a significant contribution to the field of representation learning. By enabling effective dimensionality reduction while maintaining performance, it opens new avenues for the practical application of LLM embeddings across various domains.

Create an account to read this summary for free:

Newsletter

Get summaries of trending comp sci papers delivered straight to your inbox:

Unsubscribe anytime.