Emergent Mind

Efficient Routing for Cost Effective Scale-out Data Architectures

(1606.08884)
Published Jun 28, 2016 in cs.DB

Abstract

Efficient retrieval of information is of key importance when using Big Data systems. In large scale-out data architectures, data are distributed and replicated across several machines. Queries/tasks to such data architectures, are sent to a router which determines the machines containing the requested data. Ideally, to reduce the overall cost of analytics, the smallest set of machines required to satisfy the query should be returned by the router. Mathematically, this can be modeled as the set cover problem, which is NP-hard, thus making the routing process a balance between optimality and performance. Even though an efficient greedy approximation algorithm for routing a single query exists, there is currently no better method for processing multiple queries than running the greedy set cover algorithm repeatedly for each query. This method is impractical for Big Data systems and the state-of-the-art techniques route a query to all machines and choose as a cover the machines that respond fastest. In this paper, we propose an efficient technique to speedup the routing of a large number of real-time queries while minimizing the number of machines that each query touches (query span). We demonstrate that by analyzing the correlation between known queries and performing query clustering, we can reduce the set cover computation time, thereby significantly speeding up routing of unknown queries. Experiments show that our incremental set cover-based routing is 2.5 times faster and can return on average 50% fewer machines per query when compared to repeated greedy set cover and baseline routing techniques.

We're not able to analyze this paper right now due to high demand.

Please check back later (sorry!).

Generate a summary of this paper on our Pro plan:

We ran into a problem analyzing this paper.

Newsletter

Get summaries of trending comp sci papers delivered straight to your inbox:

Unsubscribe anytime.