Emergent Mind

Abstract

Approximate Nearest Neighbour Search (ANNS) is a subroutine in algorithms routinely employed in information retrieval, pattern recognition, data mining, image processing, and beyond. Recent works have established that graph-based ANNS algorithms are practically more efficient than the other methods proposed in the literature, on large datasets. The growing volume and dimensionality of data necessitates designing scalable techniques for ANNS. To this end, the prior art has explored parallelizing graph-based ANNS on GPU leveraging its high computational power and energy efficiency. The current state-of-the-art GPU-based ANNS algorithms either (i) require both the index-graph and the data to reside entirely in the GPU memory, or (ii) they partition the data into small independent shards, each of which can fit in GPU memory, and perform the search on these shards on the GPU. While the first approach fails to handle large datasets due to the limited memory available on the GPU, the latter delivers poor performance on large datasets due to high data traffic over the low-bandwidth PCIe bus. In this paper, we introduce BANG, a first-of-its-kind GPU-based ANNS method which works efficiently on billion-scale datasets that cannot entirely fit in the GPU memory. BANG stands out by harnessing compressed data on the GPU to perform distance computations while maintaining the graph on the CPU. BANG incorporates high-optimized GPU kernels and proceeds in stages that run concurrently on the GPU and CPU, taking advantage of their architectural specificities. We evaluate BANG using a single NVIDIA Ampere A100 GPU on ten popular ANN benchmark datasets. BANG outperforms the state-of-the-art in the majority of the cases. Notably, on the billion-size datasets, we are significantly faster than our competitors, achieving throughputs 40x-200x more than the competing methods for a high recall of 0.9.

A single convolutional layer, leading to single-digit mean error rates on the SIFT1B dataset.

Overview

  • The paper introduces a novel method for Approximate Nearest Neighbor Search (ANNS) that leverages both CPU and GPU architectures to efficiently handle billion-scale datasets.

  • Key innovations include the use of compressed data via Product Quantization on the GPU, optimized GPU kernels for various operations, and advanced CPU-GPU synchronization techniques.

  • Extensive evaluations on popular benchmark datasets demonstrate significant performance improvements over existing methods, with throughputs 40x-200x higher for high recall rates.

Billion-Scale Approximate Nearest Neighbor Search Using a Single GPU

The paper "Billion-Scale Approximate Nearest Neighbor Search Using a Single GPU" presents a novel method for Approximate Nearest Neighbor Search (ANNS) that efficiently operates on billion-scale datasets using a single GPU. The proposed method addresses the core challenges posed by large datasets and the limitations of GPU memory while maintaining high recall rates and throughput.

Key Contributions

  1. Hybrid Architecture: The method leverages both CPU and GPU architectures for different tasks. The core innovation lies in using compressed data on the GPU for distance calculations while maintaining the graph structure on the CPU. This dual approach balances the workload, optimizes resource usage, and reduces data transfer overhead.
  2. Compressed Vector Representation: By employing Product Quantization (PQ), the method compresses the dataset vectors before they are processed on the GPU. This compression significantly reduces the memory footprint on the GPU, enabling the processing of billion-scale datasets.
  3. Optimized GPU Kernels: The implementation includes highly optimized GPU kernels for various operations such as distance computations, sorting, and updating worklists. These optimizations ensure the efficient utilization of GPU resources and maximize the throughput of the ANNS process.
  4. CPU-GPU Synchronization: The method minimizes the data transfer between CPU and GPU by overlapping communication with computation. Advanced CUDA features like asynchronous memcpy APIs and streams are used to hide data transfer latencies and keep both the CPU and GPU occupied concurrently.
  5. Evaluation on Benchmark Datasets: The method is evaluated on ten popular ANN benchmark datasets using a single NVIDIA Ampere A100 GPU. The results demonstrate substantial performance improvements over existing state-of-the-art methods, especially for large datasets.

Detailed Overview

Background and Motivation

ANNS is a crucial algorithm in many fields, including information retrieval, pattern recognition, and data mining. The increasing volume and dimensionality of data necessitate scalable techniques for ANNS. Traditional graph-based ANNS algorithms have shown practical efficiency on large datasets, but their GPU-based implementations face challenges related to GPU memory limitations and CPU-GPU data transfer bottlenecks.

Methodology

The proposed method addresses these challenges by using a hybrid CPU-GPU approach where:

  • The compressed vector data is processed on the GPU for fast distance computations.
  • The graph structure is maintained on the CPU to handle the large memory requirements of billion-scale datasets.

Core Algorithm

The ANNS process is divided into three primary stages:

  1. Distance Table Construction: Pre-computation of distances between query points and cluster centroids using PQ.
  2. ANN Search: Iterative search on the graph structure, using priority worklists and optimized distance calculations.
  3. Re-ranking: Final adjustment of the nearest neighbor list based on exact distances to improve recall.

Performance Evaluation

The method's performance was evaluated on datasets such as SIFT1B, DEEP1B, and SPACEV1B, achieving throughputs 40x-200x higher than competing methods for a high recall of 0.9. Additionally, evaluations on smaller datasets demonstrated that the method is almost always faster or comparable to state-of-the-art methods.

Implications and Future Work

This research presents significant implications for the practical application of ANNS, particularly in resource-constrained environments where the use of a single GPU is a necessity. The approach of combining compressed data computations with optimized parallel processing opens new avenues for efficient data processing at scale.

Future developments could focus on further reducing the memory footprint, improving compression techniques, and extending the method to multi-GPU systems. Additionally, exploring different graph structures and optimization algorithms could further enhance the performance and applicability of the method.

In conclusion, this paper introduces an efficient and scalable solution to ANNS challenges on a single GPU, demonstrating remarkable improvements in throughput and recall for billion-scale datasets. These advancements pave the way for more practical and widespread use of ANNS in large-scale data processing applications.

Create an account to read this summary for free:

Newsletter

Get summaries of trending comp sci papers delivered straight to your inbox:

Unsubscribe anytime.

YouTube