Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Let them have CAKES: A Cutting-Edge Algorithm for Scalable, Efficient, and Exact Search on Big Data (2309.05491v2)

Published 11 Sep 2023 in cs.DS

Abstract: The ongoing Big Data explosion has created a demand for efficient and scalable algorithms for similarity search. Most recent work has focused on \textit{approximate} $k$-NN search, and while this may be sufficient for some applications, \textit{exact} $k$-NN search would be ideal for many applications. We present CAKES, a set of three novel, exact algorithms for $k$-NN search. CAKES's algorithms are generic over \textit{any} distance function, and they \textit{do not} scale with the cardinality or embedding dimension of the dataset, but rather with its metric entropy and fractal dimension. We test these claims on datasets from the ANN-Benchmarks suite under commonly-used distance functions, as well as on a genomic dataset with Levenshtein distance and a radio-frequency dataset with Dynamic Time Warping distance. We demonstrate that CAKES exhibits near-constant scaling with cardinality on data conforming to the manifold hypothesis, and has perfect recall on data in \textit{metric} spaces. We also demonstrate that CAKES exhibits significantly higher recall than state-of-the-art $k$-NN search algorithms when the distance function is not a metric. Additionally, we show that indexing and tuning time for CAKES is an order of magnitude, or more, faster than state-of-the-art approaches. We conclude that CAKES is a highly efficient and scalable algorithm for exact $k$-NN search on Big Data. We provide a Rust implementation of CAKES.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Morgan E. Prior (2 papers)
  2. Thomas J. Howard III (2 papers)
  3. Oliver McLaughlin (1 paper)
  4. Terrence Ferguson (1 paper)
  5. Najib Ishaq (4 papers)
  6. Noah M. Daniels (7 papers)
Citations (1)

Summary

We haven't generated a summary for this paper yet.