Matryoshka Representation Learning (2205.13147v4)

Published 26 May 2022 in cs.LG and cs.CV

Abstract: Learned representations are a central component in modern ML systems, serving a multitude of downstream tasks. When training such representations, it is often the case that computational and statistical constraints for each downstream task are unknown. In this context rigid, fixed capacity representations can be either over or under-accommodating to the task at hand. This leads us to ask: can we design a flexible representation that can adapt to multiple downstream tasks with varying computational resources? Our main contribution is Matryoshka Representation Learning (MRL) which encodes information at different granularities and allows a single embedding to adapt to the computational constraints of downstream tasks. MRL minimally modifies existing representation learning pipelines and imposes no additional cost during inference and deployment. MRL learns coarse-to-fine representations that are at least as accurate and rich as independently trained low-dimensional representations. The flexibility within the learned Matryoshka Representations offer: (a) up to 14x smaller embedding size for ImageNet-1K classification at the same level of accuracy; (b) up to 14x real-world speed-ups for large-scale retrieval on ImageNet-1K and 4K; and (c) up to 2% accuracy improvements for long-tail few-shot classification, all while being as robust as the original representations. Finally, we show that MRL extends seamlessly to web-scale datasets (ImageNet, JFT) across various modalities -- vision (ViT, ResNet), vision + language (ALIGN) and language (BERT). MRL code and pretrained models are open-sourced at https://github.com/RAIVNLab/MRL.

Citations (47)

View on Semantic Scholar

Summary

The paper introduces a flexible, nested embedding methodology that adapts representation granularity to meet varying task demands.
It demonstrates embedding size reductions up to 14 times in classification and retrieval speed-ups up to 128 times without sacrificing accuracy.
The MRL framework integrates seamlessly with existing pipelines, enhancing performance in few-shot tasks and large-scale applications.

Introduction

Representation learning is a cornerstone of modern ML systems, impacting a range of downstream tasks. Fixed-capacity representations, the norm in deep learning architectures, can underperform when faced with varying computational and statistical task requirements. The introduction of Matryoshka Representation Learning (MRL) addresses this rigidity, presenting a novel methodology for developing adaptable representations at multiple granularities, improving the trade-off between accuracy and computational efficiency.

Matryoshka Representation Learning

MRL posits a flexible representation where a single high-dimensional vector can accommodate multiple lower-dimensional representation spaces. Through explicit optimization of log-scaled lower-dimensional vectors nested within a single embedding, MRL generates representations that are information-rich for their size without incurring additional cost during inference. The MRL framework integrates seamlessly into existing pipelines and is suitable for tasks in computer vision and natural language processing.

Applications in Classification and Retrieval

In classification, MRL reveals its capability to reduce average embedding sizes significantly while maintaining accuracy. Implemented within an adaptive classification framework, MRL demonstrates up to 14 times smaller embedding sizes compared to baselines without a drop in model performance. For retrieval tasks, MRL showcases practical efficiency improvements with reported theoretical speed-ups up to 128 times and actual reduction in wall-clock time by a factor of 14 in large-scale cases.

Downstream Task Improvements and Extensions

MRL Matryoshka Representations enable gains in performance for long-tail few-shot classification tasks with improved robustness compared to original embeddings. MRL's design extends effortlessly to web-scale datasets and shows promising adaptability across modalities, from vision and LLMs to more traditional classification frameworks. Analysis supports its application in evaluating classification difficulty and identifying information bottlenecks, reflecting its broader implications for model interpretability and data science methodologies.