Locality-based Graph Reordering for Processing Speed-Ups and Impact of Diameter (2111.12281v2)

Published 24 Nov 2021 in cs.AR

Abstract: Graph analysis involves a high number of random memory access patterns. Earlier research has shownthat the cache miss latency is responsible for more than half of the graph processing time, with the CPU execution having the smaller share. There has been significant study on decreasing the CPU computing time for example, by employing better cache prefetching and replacement policies. In thispaper, we study the various methods that do so by attempting to decrease the CPU cache miss ratio.Graph Reordering attempts to exploit the power-law distribution of graphs -- few sparsely-populated vertices in the graph have high number of connections -- to keep the frequently accessed vertices together locally and hence decrease the cache misses. However, reordering the graph by keeping the hot vertices together may affect the spatial locality of the graph, and thus add to the total CPU compute time.Also, we also need to have a control over the total reordering time and its inverse relation with thefinal CPU execution timeIn order to exploit this trade-off between reordering as per vertex hotness and spatial locality, we introduce the light-weight Community-based Reordering. We attempt to maintain the community-structureof the graph by storing the hot-members in the community locally together. The implementation also takes into consideration the impact of graph diameter on the execution time. We compare our implementation with other reordering implementations and find a significantly better result on five graph processing algorithms: BFS, CC, CCSV, PR and BC. Lorder achieved speed-up of upto 7x and an average speed-up of 1.2x as compared to other reordering algorithms