- The paper introduces the Highways-on-Disk (HoD) technique as a disk-based index that efficiently processes single-source shortest path and distance queries on large, directed and weighted graphs.
- It employs an iterative node removal strategy with shortcut creation to preserve distances and optimize both memory usage and I/O operations.
- Performance evaluations demonstrate that HoD significantly reduces pre-computation time and query latency compared to existing methods, showing promise for large-scale real-world applications.
Efficient Single-Source Shortest Path and Distance Queries on Large Graphs
Introduction
The paper focuses on addressing the inefficiencies of current solutions for single-source shortest path (SSSP) and single-source distance (SSD) queries on large graphs, which frequently do not fit in main memory. The authors introduce "Highways-on-Disk" (HoD), a disk-based index structure that efficiently handles these queries on directed and weighted graphs, a departure from prior techniques focused solely on undirected and/or unweighted graphs.
Highways-on-Disk (HoD) Technique
HoD augments the original graph with auxiliary edges, known as shortcuts, to make use of these during query processing, reducing both I/O and computation costs. These shortcuts represent direct paths between nodes that bypass intermediary nodes while retaining the shortest path distance, thus streamlining the query process.
Index Construction
- Node Selection and Shortcut Creation: Nodes are iteratively removed from the graph. Accompanying shortcuts are created to preserve direct distances between the neighbors of removed nodes. This process ensures that even after node removal, the essential topological structure allowing efficient distance computation remains intact.
- Core Graph Determination: The reduction process continues until the remaining graph, referred to as the core graph, fits in main memory. This core graph is retained on disk, together with additional forward and backward graphs constructed from the remaining graph structure.
- Memory and I/O Optimization: The entire indexing scheme involves sorting edges and arranging graph structures to facilitate linear scans during query processing, minimizing costly random disk accesses.
Query Processing
Queries are processed in three stages:
- Forward Search: Initiate from the query's source node and traverse the augmented graph along higher-ranking paths, constrained by the order in which nodes are processed (increasing rank).
- Core Search: Engage with the in-memory core graph, continuing to use a priority-based search to further expand paths discovered during the forward search.
- Backward Search: Sequentially review the backward graph in reverse node order to refine distance calculations for nodes not included in the core.
This method ensures every query efficiently utilizes main memory for parts of the graph, complemented by disk-based operations carefully designed to emulate linear scans.
HoD significantly reduces pre-computation time and space requirements when compared with existing solutions like VC-Index while achieving superior query response times. On undirected graphs, HoD is shown to outperform alternatives by margins that often reach an order of magnitude when handling massive datasets. Its ability to effectively handle directed graphs similarly marks a substantial improvement over existing methods constrained to undirected data.
Implications and Future Directions
HoD's focus on scalability and efficient disk usage presents it as a critical tool for real-world applications like social networks and map services utilizing massive graph data. Future work might explore extending this framework to accommodate dynamic graphs that evolve over time or integrating point-to-point shortest path queries within this disk-based framework, broadening its applicability in practical and varied scenarios.
In conclusion, HoD provides a robust foundation for SSD and SSSP queries on large-scale graph datasets, with significant potential to drive future advancements in large graph processing frameworks.