Emergent Mind

A Sublinear Algorithm for Approximate Shortest Paths in Large Networks

(2406.08624)
Published Jun 12, 2024 in cs.DS , cs.DM , and cs.SI

Abstract

Computing distances and finding shortest paths in massive real-world networks is a fundamental algorithmic task in network analysis. There are two main approaches to solving this task. On one hand are traversal-based algorithms like bidirectional breadth-first search (BiBFS) with no preprocessing step and slow individual distance inquiries. On the other hand are indexing-based approaches, which maintain a large index. This allows for answering individual inquiries very fast; however, index creation is prohibitively expensive. We seek to bridge these two extremes: quickly answer distance inquiries without the need for costly preprocessing. In this work, we propose a new algorithm and data structure, WormHole, for approximate shortest path computations. WormHole leverages structural properties of social networks to build a sublinearly sized index, drawing upon the explicit core-periphery decomposition of Ben-Eliezer et al. Empirically, the preprocessing time of WormHole improves upon index-based solutions by orders of magnitude, and individual inquiries are consistently much faster than in BiBFS. The acceleration comes at the cost of a minor accuracy trade-off. Nonetheless, our empirical evidence demonstrates that WormHole accurately answers essentially all inquiries within a maximum additive error of 2. We complement these empirical results with provable theoretical guarantees, showing that WormHole requires $n{o(1)}$ node queries per distance inquiry in random power-law networks. In contrast, any approach without a preprocessing step requires $n{\Omega(1)}$ queries for the same task. WormHole does not require reading the whole graph. Unlike the vast majority of index-based algorithms, it returns paths, not just distances. For faster inquiry times, it can be combined effectively with other index-based solutions, by running them only on the sublinear core.

Disk space footprint comparison for different methods and number of vertices queried in various scenarios.

Overview

  • This paper presents Wormhole, an algorithm designed to efficiently process shortest path queries in large networks by using a sublinear index based on core-periphery structure.

  • The algorithm works in two phases: preprocessing and query. During preprocessing, it identifies a highly connected inner core, and during queries, it combines truncated BFS and core-centric routing to quickly find paths.

  • Theoretical results demonstrate its efficiency and accuracy, with empirical analyses confirming reduced query costs and high accuracy, even for networks with billions of edges.

A Sublinear Algorithm for Approximate Shortest Paths in Large Networks

This paper introduces a novel approach to efficiently answering shortest path queries in massive real-world networks, leveraging the core-periphery structure typical of many social and information networks. The authors propose an algorithm, denoted as Wormhole, that combines the benefits of traversal-based and indexing-based approaches to achieve rapid query responses with minimal preprocessing overhead. Specifically, the algorithm uses a sublinear-sized index, referred to as the inner core, which is informed by the structural properties of the target networks.

Problem Statement and Novel Contribution

The core problem addressed is the inefficiency of existing methods in handling shortest path queries in large-scale networks. Traditional methods fall into two categories:

  1. Traversal-based algorithms like bidirectional breadth-first search (BiBFS) that do not require a preprocessing step but are computationally intensive for individual queries.
  2. Indexing-based algorithms that involve significant preprocessing to create a large index, allowing for fast individual queries, but are impractical for large networks due to prohibitive preprocessing times and storage requirements.

Wormhole bridges these two extremes by constructing a sublinear index that enables quick queries without extensive preprocessing.

The Algorithm: Phases and Mechanism

The Wormhole algorithm operates in two phases: the preprocessing phase and the query phase.

Preprocessing Phase:

  • The algorithm identifies an inner core, denoted (C{\text{in}}), and an outer ring, (C{\text{out}}), using a method adapted from Ben-Eliezer et al.'s core-periphery decomposition.
  • This adaptation ensures that (C_{\text{in}}) includes vertices with high connectivity, constructing it in a sublinear fashion relative to the network size.

Query Phase:

  • For a given query pair ((s, t)), Wormhole first uses truncated BFS from both (s) and (t).
  • If the BFS trees intersect, the algorithm returns the path directly.
  • If not, the search continues until (C{\text{out}}) is reached. From here, the query is routed through (C{\text{in}}), leveraging its highly connected structure to quickly compute a near-optimal path.

Theoretical Foundations and Empirical Performance

The theoretical contributions focus on proving that this algorithm maintains efficiency and accuracy under realistic conditions:

Additive Error:

  • The worst-case additive error is shown to be (O(\log \log n)) for random power-law graphs, a significant improvement over the typical diameters of such graphs.

Query Complexity:

  • The preprocessing query complexity is sublinear, and the per-query complexity is (n{o(1)}), significantly lower than traversal-based methods that require (n{\Omega(1)}) queries.

The paper supports these theoretical results with comprehensive empirical analysis:

  • Query Cost: The Wormhole consistently queries a significantly smaller fraction of the network compared to BiBFS.
  • Accuracy: The majority of shortest path queries have an additive error of at most 2, demonstrating high accuracy given the minor trade-offs involved.
  • Setup and Inquiry Time: The setup for Wormhole is completed in minutes even for billion-edge graphs, with per-query times significantly outperforming BiBFS and comparable to indexing methods restricted to the inner core.

Practical and Theoretical Implications

The development of Wormhole paves the way for several significant implications and future developments:

  1. Scalability: This approach makes the processing of large-scale networks feasible, addressing practical constraints in storage and computation.
  2. Adaptability: The hybrid nature allows the algorithm to be tailored to specific network structures, combining traversal and index-based strategies effectively.
  3. Foundation for Further Research: The sublinear core-periphery decomposition introduces a new perspective on network preprocessing, potentially influencing the design of other scalable algorithms in graph mining and network analysis.

This work represents a significant step in advancing methodologies for large-scale network analysis, providing both a practical tool for immediate applications and a theoretical framework that can inspire future research in efficient graph algorithms.

Create an account to read this summary for free:

Newsletter

Get summaries of trending comp sci papers delivered straight to your inbox:

Unsubscribe anytime.