Papers

Topics

Authors

Recent

View all

Gemini 2.5 Flash

144 tokens/sec

GPT-4o

7 tokens/sec

Gemini 2.5 Pro Pro

45 tokens/sec

o3 Pro

4 tokens/sec

GPT-4.1 Pro

38 tokens/sec

DeepSeek R1 via Azure Pro

28 tokens/sec

2000 character limit reached

261 1

ACORN: Performant and Predicate-Agnostic Search Over Vector Embeddings and Structured Data (2403.04871v1)

Published 7 Mar 2024 in cs.IR and cs.DB

Abstract: Applications increasingly leverage mixed-modality data, and must jointly search over vector data, such as embedded images, text and video, as well as structured data, such as attributes and keywords. Proposed methods for this hybrid search setting either suffer from poor performance or support a severely restricted set of search predicates (e.g., only small sets of equality predicates), making them impractical for many applications. To address this, we present ACORN, an approach for performant and predicate-agnostic hybrid search. ACORN builds on Hierarchical Navigable Small Worlds (HNSW), a state-of-the-art graph-based approximate nearest neighbor index, and can be implemented efficiently by extending existing HNSW libraries. ACORN introduces the idea of predicate subgraph traversal to emulate a theoretically ideal, but impractical, hybrid search strategy. ACORN's predicate-agnostic construction algorithm is designed to enable this effective search strategy, while supporting a wide array of predicate sets and query semantics. We systematically evaluate ACORN on both prior benchmark datasets, with simple, low-cardinality predicate sets, and complex multi-modal datasets not supported by prior methods. We show that ACORN achieves state-of-the-art performance on all datasets, outperforming prior methods with 2-1,000x higher throughput at a fixed recall.

References (69)

Citations (4)

View on Semantic Scholar

Summary

The paper introduces ACORN, a hybrid search method that efficiently indexes predicate subgraphs to unify vector and structured data search.
It reimagines the HNSW algorithm to achieve 2–10× higher QPS on low-cardinality sets and over 30× on complex, high-cardinality scenarios.
ACORN offers two variants—ACORN-γ for performance and ACORN-1 for efficient construction—enabling versatile and scalable deployment in real-world applications.

ACORN: Advancing Hybrid Search with Predicate-Agnostic Vector and Structured Data Indexing

Introduction to Hybrid Search Challenges

Hybrid search, which entails querying over both unstructured vector data and structured attributes, is central to numerous modern applications, from e-commerce platforms to scholarly article repositories. Despite its widespread utility, hybrid search presents significant computational challenges. Existing solutions often compromise either on search performance due to inefficient handling of mixed data types or on query expressiveness by restricting the types of searchable predicates. Addressing these limitations, the paper introduces ACORN, an approach designed to efficiently perform hybrid search across vectors and structured data without constraining predicate types.

ACORN Overview

ACORN stands for ANN Constraint-Optimized Retrieval Network. It reimagines the hierarchical navigable small world (HNSW) indexing algorithm to support hybrid querying effectively. ACORN introduces two variants: ACORN- $\gamma$ , which emphasizes search performance, and ACORN-1, which optimizes for reduced construction overhead. The primary innovation lies in enabling search over predicate subgraphs -- subgraphs of the index where a given predicate is true. By ensuring these subgraphs resemble an ideal HNSW index, ACORN bridges the performance gap between traditional vector search and hybrid search needs.

Performance Benchmarks

In a comprehensive evaluation across several datasets, ACORN demonstrates impressive performance metrics:

LCPS Benchmarks: On low-cardinality predicate set (LCPS) benchmarks, which previous specialized indices can handle, ACORN- $\gamma$ achieves 2--10 $\times$ higher query per second (QPS) rates at 0.9 recall compared to these specialized methods.
HCPS Benchmarks: For high-cardinality predicate sets (HCPS), representing more complex real-world scenarios, ACORN- $\gamma$ continues to outperform existing baselines by over 30 $\times$ in QPS at equal recall levels.
Construction Efficiency: While ACORN- $\gamma$ presents a higher time-to-index (TTI) compared to HNSW, it offers significant gains in search performance, justifying the trade-off. Conversely, ACORN-1 achieves a TTI that is on par or better than existing methods, making it a viable option for resource-constrained scenarios.

Technical Innovations

ACORN's strategy to traverse predicate subgraphs during search and its approach to construct denser, albeit more navigable, graphs are central to its efficiency. The introduction of a predicate-agnostic pruning strategy during construction and the flexibility in choosing neighbor expansion factors allow ACORN to adapt seamlessly across various datasets and query predicates.

Theoretical and Practical Implications

ACORN's design philosophy underscores a critical insight: hybrid search need not be confined by the limitations of existing data structures, nor should it compromise on query expressiveness. Practically, ACORN opens up new avenues for building more robust, efficient, and versatile search functionalities in applications that require dealing with complex, mixed-modality data.

Future Directions

The remarkable performance of ACORN in handling diverse datasets and query types suggest significant potential for future work. Exploring ACORN's adaptability to other graph-based indices and further optimizing its construction for even larger datasets are immediate next steps. Additionally, investigating the integration of ACORN into distributed search systems could further extend its utility and impact.

Conclusion

ACORN represents a significant step forward in the endeavor to provide efficient and expressive hybrid search capabilities. Its innovative approach to indexing and searching across mixed-modality data not only sets a new benchmark for performance but also broadens the horizon for query functionalities available to modern applications.

PDF Markdown

Tweets

https://twitter.com/matei_zaharia/status/1767245785794765173

https://twitter.com/fly51fly/status/1769121670877663359

https://twitter.com/debasishg/status/1767852785306112308

https://twitter.com/knishimae0531/status/1769176229578789282

YouTube

Show All Videos