Emergent Mind

Abstract

Applications increasingly leverage mixed-modality data, and must jointly search over vector data, such as embedded images, text and video, as well as structured data, such as attributes and keywords. Proposed methods for this hybrid search setting either suffer from poor performance or support a severely restricted set of search predicates (e.g., only small sets of equality predicates), making them impractical for many applications. To address this, we present ACORN, an approach for performant and predicate-agnostic hybrid search. ACORN builds on Hierarchical Navigable Small Worlds (HNSW), a state-of-the-art graph-based approximate nearest neighbor index, and can be implemented efficiently by extending existing HNSW libraries. ACORN introduces the idea of predicate subgraph traversal to emulate a theoretically ideal, but impractical, hybrid search strategy. ACORN's predicate-agnostic construction algorithm is designed to enable this effective search strategy, while supporting a wide array of predicate sets and query semantics. We systematically evaluate ACORN on both prior benchmark datasets, with simple, low-cardinality predicate sets, and complex multi-modal datasets not supported by prior methods. We show that ACORN achieves state-of-the-art performance on all datasets, outperforming prior methods with 2-1,000x higher throughput at a fixed recall.

Green nodes depict predicate subgraph where ACORN emulates search over an oracle partition index.

Overview

  • ACORN introduces an innovative approach to hybrid search, combining unstructured vector data and structured attributes without limiting predicate types.

  • It redefines the HNSW indexing algorithm to support effective hybrid querying, introducing ACORN-$\gamma$ for search performance and ACORN-1 for construction efficiency.

  • ACORN demonstrates superior performance metrics in both LCPS and HCPS benchmarks, significantly outperforming existing baselines in query speed and recall.

  • The method enables advanced search functionalities and sets new performance benchmarks for applications requiring complex, mixed-modality data handling.

ACORN: Advancing Hybrid Search with Predicate-Agnostic Vector and Structured Data Indexing

Introduction to Hybrid Search Challenges

Hybrid search, which entails querying over both unstructured vector data and structured attributes, is central to numerous modern applications, from e-commerce platforms to scholarly article repositories. Despite its widespread utility, hybrid search presents significant computational challenges. Existing solutions often compromise either on search performance due to inefficient handling of mixed data types or on query expressiveness by restricting the types of searchable predicates. Addressing these limitations, the paper introduces ACORN, an approach designed to efficiently perform hybrid search across vectors and structured data without constraining predicate types.

ACORN Overview

ACORN stands for ANN Constraint-Optimized Retrieval Network. It reimagines the hierarchical navigable small world (HNSW) indexing algorithm to support hybrid querying effectively. ACORN introduces two variants: ACORN-$\gamma$, which emphasizes search performance, and ACORN-1, which optimizes for reduced construction overhead. The primary innovation lies in enabling search over predicate subgraphs -- subgraphs of the index where a given predicate is true. By ensuring these subgraphs resemble an ideal HNSW index, ACORN bridges the performance gap between traditional vector search and hybrid search needs.

Performance Benchmarks

In a comprehensive evaluation across several datasets, ACORN demonstrates impressive performance metrics:

  • LCPS Benchmarks: On low-cardinality predicate set (LCPS) benchmarks, which previous specialized indices can handle, ACORN-$\gamma$ achieves 2--10$\times$ higher query per second (QPS) rates at 0.9 recall compared to these specialized methods.
  • HCPS Benchmarks: For high-cardinality predicate sets (HCPS), representing more complex real-world scenarios, ACORN-$\gamma$ continues to outperform existing baselines by over 30$\times$ in QPS at equal recall levels.
  • Construction Efficiency: While ACORN-$\gamma$ presents a higher time-to-index (TTI) compared to HNSW, it offers significant gains in search performance, justifying the trade-off. Conversely, ACORN-1 achieves a TTI that is on par or better than existing methods, making it a viable option for resource-constrained scenarios.

Technical Innovations

ACORN's strategy to traverse predicate subgraphs during search and its approach to construct denser, albeit more navigable, graphs are central to its efficiency. The introduction of a predicate-agnostic pruning strategy during construction and the flexibility in choosing neighbor expansion factors allow ACORN to adapt seamlessly across various datasets and query predicates.

Theoretical and Practical Implications

ACORN's design philosophy underscores a critical insight: hybrid search need not be confined by the limitations of existing data structures, nor should it compromise on query expressiveness. Practically, ACORN opens up new avenues for building more robust, efficient, and versatile search functionalities in applications that require dealing with complex, mixed-modality data.

Future Directions

The remarkable performance of ACORN in handling diverse datasets and query types suggest significant potential for future work. Exploring ACORN's adaptability to other graph-based indices and further optimizing its construction for even larger datasets are immediate next steps. Additionally, investigating the integration of ACORN into distributed search systems could further extend its utility and impact.

Conclusion

ACORN represents a significant step forward in the endeavor to provide efficient and expressive hybrid search capabilities. Its innovative approach to indexing and searching across mixed-modality data not only sets a new benchmark for performance but also broadens the horizon for query functionalities available to modern applications.

Create an account to read this summary for free:

Newsletter

Get summaries of trending comp sci papers delivered straight to your inbox:

Unsubscribe anytime.

YouTube