Emergent Mind

Abstract

Vector search systems, pivotal in AI applications, often rely on the Hierarchical Navigable Small Worlds (HNSW) algorithm. However, the behaviour of HNSW under real-world scenarios using vectors generated with deep learning models remains under-explored. Existing Approximate Nearest Neighbours (ANN) benchmarks and research typically has an over-reliance on simplistic datasets like MNIST or SIFT1M and fail to reflect the complexity of current use-cases. Our investigation focuses on HNSW's efficacy across a spectrum of datasets, including synthetic vectors tailored to mimic specific intrinsic dimensionalities, widely-used retrieval benchmarks with popular embedding models, and proprietary e-commerce image data with CLIP models. We survey the most popular HNSW vector databases and collate their default parameters to provide a realistic fixed parameterisation for the duration of the paper. We discover that the recall of approximate HNSW search, in comparison to exact K Nearest Neighbours (KNN) search, is linked to the vector space's intrinsic dimensionality and significantly influenced by the data insertion sequence. Our methodology highlights how insertion order, informed by measurable properties such as the pointwise Local Intrinsic Dimensionality (LID) or known categories, can shift recall by up to 12 percentage points. We also observe that running popular benchmark datasets with HNSW instead of KNN can shift rankings by up to three positions for some models. This work underscores the need for more nuanced benchmarks and design considerations in developing robust vector search systems using approximate vector search algorithms. This study presents a number of scenarios with varying real world applicability which aim to better increase understanding and future development of ANN algorithms and embedding

Recall for HNSWLib and FAISS on 10,000 vectors with 1,000 queries.

Overview

  • The study comprehensively evaluates the Hierarchical Navigable Small Worlds (HNSW) algorithm's performance, particularly focusing on vectors generated by modern deep learning models, and highlights the limitations of current benchmarks.

  • Key factors affecting HNSW's recall performance, including intrinsic dimensionality of data and data insertion sequence, are examined using synthetic and real-world datasets, revealing significant dependencies and potential optimization avenues.

  • The research proposes a re-evaluation of current benchmarking practices for Approximate Nearest Neighbors (ANN) algorithms, urging the development of evaluation methodologies that better reflect real-world applications.

Analyzing the Performance of HNSW Vector Search Systems in Real-World Scenarios

This paper undertakes a comprehensive study of the Hierarchical Navigable Small Worlds (HNSW) algorithm's efficacy across a range of datasets, particularly focusing on vectors created by contemporary deep learning models. The research addresses a significant gap in the existing literature, where most Approximate Nearest Neighbors (ANN) benchmarks rely heavily on simplistic datasets like MNIST or SIFT1M, which fail to capture the complexity inherent in modern AI applications.

The investigation systematically evaluates the impact of various factors, including the intrinsic dimensionality of vector spaces and the sequence of data insertion, on the recall performance of HNSW. This study comprises tests on synthetic datasets, popular retrieval benchmarks with diverse embedding models, and proprietary e-commerce image data leveraging CLIP models. The methodologies and findings presented shed light on critical aspects of HNSW’s performance, urging a reconsideration of current benchmarking practices for ANN algorithms.

Key Findings

Impact of Intrinsic Dimensionality on Recall

  • The study reveals that the recall of approximate HNSW search, in comparison to exact KNN search, is intricately linked to the intrinsic dimensionality of the vector space. The researchers generated synthetic data with varying intrinsic dimensionalities using orthonormal basis vectors and evaluated the recall of HNSW implementations (HNSWLib and FAISS). Their findings indicate a significant degradation in recall as the intrinsic dimensionality increases.
  • Figures presented in the paper show a drop in recall by approximately 50% when the data approaches a full rank, exhibiting a clear dependency on the intrinsic dimensionality.

Influence of Data Insertion Sequence

  • The sequence in which data is inserted into the HNSW index significantly affects recall. This is demonstrated by experiments where data ordered by descending Local Intrinsic Dimensionality (LID) achieves a higher recall compared to ascending LID or random order.
  • The average recall for descending LID order was found to be up to 12.8 percentage points higher than for ascending LID order, indicating a potential avenue for optimizing HNSW graph construction.

Impact on Retrieval Benchmarks and Model Rankings

  • Standard retrieval benchmark datasets show that model rankings change when evaluated with various retrieval systems. This suggests that evaluations done with exact KNN may not be fully representative of those performed using approximate nearest neighbors.
  • This divergence in rankings is quantified with shifts of up to three positions on the leaderboard, emphasizing the need for benchmarks that reflect the peculiarities of approximate retrieval systems.

Real-World Dataset Evaluation

  • Evaluations using real-world e-commerce datasets showed substantial variations in recall based on the order of data insertion and the chosen model. For example, with a fashion dataset, recall differences of up to 7.7 percentage points were observed when varying insertion sequences even for different architectures of CLIP models.
  • The inclusion of practical datasets underscores the applicability of their findings beyond controlled experimental settings, suggesting that the insights gained can translate to real-world improvements.

Implications and Future Directions

The implications of this research are manifold. Practically, it suggests that the construction of HNSW indices could be optimized by considering the intrinsic properties of the data, such as intrinsic dimensionality and local neighborhood structures. Theoretically, it calls for a re-evaluation of current benchmarks and encourages the development of more nuanced evaluation methodologies that better reflect the complexities of practical applications.

The insight that model selection for HNSW-based retrieval systems requires more than just adherence to exact KNN benchmarks warrants significant attention. It suggests that models need to be evaluated in the context of their intended use-case environments, particularly when deployed in approximate retrieval systems.

Conclusion

This paper provides a robust analysis of the HNSW algorithm's performance across various datasets and scenarios, highlighting critical factors that influence recall. It advocates for refined benchmarking practices and offers actionable insights into optimizing approximate nearest neighbor search systems. Future research should expand on these findings by exploring similar properties in other approximate retrieval algorithms, aiming to enhance their robustness and performance in real-world applications.

Create an account to read this summary for free:

Newsletter

Get summaries of trending comp sci papers delivered straight to your inbox:

Unsubscribe anytime.