Emergent Mind

Abstract

While LLMs show remarkable performance in natural language understanding, their resource-intensive nature makes them less accessible. In contrast, smaller language models such as MiniCPM offer more sustainable scalability, but often underperform without specialized optimization. In this paper, we explore the enhancement of smaller language models through the improvement of their text embeddings. We select three language models, MiniCPM, Phi-2, and Gemma, to conduct contrastive fine-tuning on the NLI dataset. Our results demonstrate that this fine-tuning method enhances the quality of text embeddings for all three models across various benchmarks, with MiniCPM showing the most significant improvements of an average 56.33% performance gain. The contrastive fine-tuning code is publicly available at https://github.com/trapoom555/Language-Model-STS-CFT.

Fine-tuning loss progression throughout training steps.

Overview

  • The paper presents a study on enhancing text embeddings for smaller language models like MiniCPM, Phi-2, and Gemma through contrastive fine-tuning, with a significant focus on MiniCPM due to its notable improvements.

  • The methodology involves the use of contrastive fine-tuning with Low-Rank Adaptation (LoRA) and Natural Language Inference (NLI) datasets to refine the models' semantic understanding and distinguish between similar and dissimilar text pairs efficiently.

  • Experimental results show that MiniCPM significantly outperforms other models on various benchmarks, emphasizing the effectiveness of contrastive fine-tuning in making smaller language models viable for resource-constrained NLP applications.

Improving Text Embeddings for Smaller Language Models Using Contrastive Fine-tuning

The paper "Improving Text Embeddings for Smaller Language Models Using Contrastive Fine-tuning" presents a study on enhancing the text embeddings of smaller language models (LMs) through the application of contrastive fine-tuning. This research focuses on three specific models: MiniCPM, Phi-2, and Gemma, with a particular emphasis on MiniCPM due to its demonstrated capacity for substantial improvement.

Introduction

Text embeddings play a crucial role in various NLP tasks, including information retrieval, document classification, and semantic textual similarity (STS). While LLMs such as GPT-4 have shown significant capabilities in natural language understanding, these models are often resource-intensive and less accessible. Smaller models like MiniCPM, Gemma, and Phi-2 offer a more scalable solution but tend to underperform without specific optimizations. This paper addresses the gap by improving the text embedding quality of smaller language models, making them viable alternatives for resource-constrained applications.

Methodology

The core methodology involves contrastive fine-tuning, a technique that enhances the models' ability to distinguish between semantically similar and dissimilar text pairs. The research leverages a parameter-efficient fine-tuning method, Low-Rank Adaptation (LoRA), to ensure the process remains computationally feasible. The training dataset used is a processed version of the Natural Language Inference (NLI) dataset, consisting of approximately 275,000 samples.

Contrastive Fine-tuning Approach

The contrastive fine-tuning approach is designed to improve the models' semantic understanding by aligning similar text representations closely in the embedding space while pushing dissimilar ones apart. This is achieved using the InfoNCE loss with in-batch negatives and hard negatives, formulated as:

[ \min - \log \frac{e{\text{sim}(h_i, hi+) / \tau}}{\sum{j=1}N \left( e{\text{sim}(h_i, hj+) / \tau }+ e{\text{sim}(hi, h_j-) / \tau} \right)} ]

Here, ( hi ) denotes an embedding vector of a premise ( xi ), ( \tau ) is a temperature parameter, and ( \text{sim}(hi, hi+) ) computes the cosine similarity between embedding vectors.

Experiments and Results

Benchmark Evaluation

The models were evaluated on nine STS benchmarks: STS12, STS13, STS14, STS15, STS16, STS17, STSBenchmark, BIOSSES, and SICK-R. These benchmarks cover a broad spectrum of sentence pairs, ranging from general news headlines to biomedical fields. The evaluation metric used was Spearman correlations between the cosine similarities of the embeddings generated by the models and ground truth similarities.

The results, as summarized in Table \ref{tab:model_performance}, demonstrate that MiniCPM significantly outperforms both Gemma and Phi-2 across all benchmarks, with an average performance gain of 56.33\%. Specifically, MiniCPM achieved the highest correlations on datasets like STS12 (76.38%) and STS17 (89.96%), indicating its robust capability in capturing semantic similarities.

Ablation Studies

Several ablation studies were conducted to delve deeper into the model's performance:

  1. Pre-Fine-Tuning Performance: This study showed that MiniCPM had the most substantial improvement post-fine-tuning, emphasizing the effectiveness of the fine-tuning process.
  2. Impact of Learning Rate: It was found that a learning rate of (5 \times 10{-5}) yielded the best results, whereas higher learning rates led to instability and underfitting.
  3. Prompting Techniques: While the original MiniCPM model benefited from specific prompt designs, the fine-tuned model exhibited marginal gains, suggesting a model-specific preference for sentence structures encountered during training.
  4. Training Data Efficiency: The model showed rapid performance gains within the first 200 training steps, showcasing high training efficiency.
  5. Hard Negatives Penalty: Penalizing hard negatives was generally beneficial, improving performance across most benchmarks.

Conclusion

This research underscores the viability of using contrastive fine-tuning to enhance the text embedding quality of smaller language models. The significant improvements observed, particularly in the MiniCPM model, highlight its potential for deployment in resource-constrained environments. The studies within the paper also offer valuable insights into the configurations that maximize the efficiency and effectiveness of the fine-tuning process.

Implications and Future Directions

The improved performance of smaller LMs through contrastive fine-tuning opens up new avenues for practical applications where computational resources are limited. This research contributes to making low-resource NLP tasks more accessible and efficient. Future developments could explore more advanced fine-tuning techniques, further optimization of learning rates, and broader applications of these smaller models across diverse NLP tasks.

The paper's code and models are publicly available, contributing to further advancements in the community and fostering collaborative improvements in the field of text embeddings.

Create an account to read this summary for free:

Newsletter

Get summaries of trending comp sci papers delivered straight to your inbox:

Unsubscribe anytime.

YouTube