Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 173 tok/s
Gemini 2.5 Pro 54 tok/s Pro
GPT-5 Medium 36 tok/s Pro
GPT-5 High 35 tok/s Pro
GPT-4o 110 tok/s Pro
Kimi K2 221 tok/s Pro
GPT OSS 120B 444 tok/s Pro
Claude Sonnet 4.5 37 tok/s Pro
2000 character limit reached

VRSD: Rethinking Similarity and Diversity for Retrieval in Large Language Models (2407.04573v2)

Published 5 Jul 2024 in cs.IR and cs.CL

Abstract: Vector retrieval algorithms are essential for semantic queries within the rapidly evolving landscape of LLMs. The ability to retrieve vectors that satisfy both similarity and diversity criteria substantially enhances the performance of LLMs. Although Maximal Marginal Relevance (MMR) is widely employed in retrieval scenarios requiring relevance and diversity, variations in the parameter $\lambda$ lead to fluctuations that complicate the optimization trajectory in vector spaces. This obscures the direction of improvement and highlights the lack of a robust theoretical analysis regarding similarity and diversity constraints in retrieval processes. To address these challenges, this paper introduces a novel approach that characterizes both constraints through the relationship between the sum vector and the query vector. The proximity of these vectors ensures the similarity constraint, while requiring individual vectors within the sum vector to diverge in their alignment with the query vector satisfies the diversity constraint. We first formulate a new combinatorial optimization problem, selecting k vectors from a candidate set such that their sum vector maximally aligns with the query vector, and demonstrate that this problem is NP-complete. This result underscores the inherent difficulty of simultaneously achieving similarity and diversity in vector retrieval, thereby providing a theoretical foundation for future research. Subsequently, we present the heuristic algorithm Vectors Retrieval with Similarity and Diversity, VRSD, which features a clear optimization objective and eliminates the need for preset parameters. VRSD also achieves a modest reduction in time complexity compared to MMR. Empirical validation confirms that VRSD significantly outperforms MMR across various datasets.

Citations (1)

Summary

We haven't generated a summary for this paper yet.

Lightbulb Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.

Authors (2)

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets

This paper has been mentioned in 2 tweets and received 0 likes.

Upgrade to Pro to view all of the tweets about this paper:

Don't miss out on important new AI/ML research

See which papers are being discussed right now on X, Reddit, and more:

“Emergent Mind helps me see which AI papers have caught fire online.”

Philip

Philip

Creator, AI Explained on YouTube