Distributed Data Processing Frameworks for Big Graph Data (1612.05859v1)

Published 18 Dec 2016 in cs.DC

Abstract: Recently we create so much data (2.5 quintillion bytes every day) that 90% of the data in the world today has been created in the last two years alone [1]. This data comes from sensors used to gather traffic or climate information, posts to social media sites, photos, videos, emails, purchase transaction records, call logs of cellular networks, etc. This data is big data. In this report, we first briefly discuss what programming models are used for big data processing, and focus on graph data and do a survey study about what programming models/frameworks are used to solve graph problems at very large-scale. In section 2, we introduce the programming models which are not specifically designed to handle graph data but we include them in this survey because we believe these are important frameworks and/or there have been studies to customize them for more efficient graph processing. In section 3, we discuss some techniques that yield up to 1340 times speedup for some certain graph problems when applied to Hadoop. In section 4, we discuss vertex-based programming model which is simply designed to process large graphs and the frameworks adapting it. In section 5, we implement two of the fundamental graph algorithms (Page Rank and Weight Bipartite Matching), and run them on a single node as the baseline approach to see how fast they are for large datasets and whether it is worth to partition them.

Summary

We haven't generated a summary for this paper yet.

Summarize Now

Distributed Data Processing Frameworks for Big Graph Data (1612.05859v1)

Summary

Related Papers