Distributed Graph Processing: Optimizations and Analysis of BSP Graph Processing Models on Public Clouds

In this paper the authors focus on solutions to the inefficiency of Pregel on complex graph algoritms such as betweenness centrality. Such algorithms tend to have a highly varying number of messages per superstep. Therefore using little number of workers may cause insufficient memory at peak messaging superstep. On the other hand renting lots of workers is costly and most workers will do little work in most supersteps. Therefore they propose usage of adaptive vertex and superstep scheduling heuristics.

In their vertex scheduling model, computation is started only for a subset of vertices called swath at a time. For finding best swath size there are 2 heuristics: Sampling run method runs small subsets of swaths, find their peak memory and extrapolates the result to find highest swath size to fit in the memory. Adaptive method bases next swath size on the peak memory usage of the previous.

For finding the best time to start a new swath they again have 2 methods: Baseline of starting the next swath after the previous is completed is not efficient. The first method is static method in which they start the next swath after a fixed number of supersteps equal to the average shortest path length. Dynamic method monitors memory utilization and starts next swath after peak utilization has passed.

They also evaluate the impact of partitioning and compare the hash partitioning with metis. As expected in Pagerank, metis helps for both experimented graphs. But in BC and APSP applications, while it helps in web-Google dataset, surprisingly it does not help for cit-Patents dataset. They explain it by the increase in work imbalance between workers resulting from the partitioning. To support this claim they give additional figures which show that despite computation time decreases with metis, synchronization barrier wait time increases making the total time of a superstep almost the same. But it cannot explain why this problem is not observed in web-Google.

They finally evaluate the effect of using clouds elastic worker number selection feature instead of swath initiation heuristics. They show that in supersteps where the computation is high, adding more workers is helpful but if the computation is low adding more workers hurts due to increased barrier synchronization cost. Using elastic worker number achieves significant cost reduction compared to 8 workers with a similar runtime. However they do not really dynamically scale worker numbers. Instead they do 2 experiments with 4 and 8 workers and at each superstep select the one that runs faster. The cost of scaling might be so high to make it useless. In addition, AWS charges per hour so changing worker number frequently might be very costly if the computation fluctuates greatly.

In this paper authors implement Pregel on .NET. They claim that Giraph is not good since it is built on Hadoop and suffers from Hadoop's side effects. But they neither explicitly state these side effects nor make a comparison with Giraph.

An important advantage of Pregel.NET is that workers have multiple threads so as to effectively use all cores in a worker. On the other hand it does not have combiners, aggregators and fault tolerance.

However, in most graph problems vertices need to execute together. Vertex values change simultaneously until convergence is achieved for all vertices. Starting vertices at different times may be applicable only for a small subset of graph problems in which vertices can execute independently..

It has a comprehensive related work section about Pregel and other BSP implementations such as GPS, Giraph, Hama and Trinity. However, it claims Giraph is bad since it buffers messages in disk instead of memory but this is not correct.

Distributed Graph Processing

Thursday, November 7, 2013

Optimizations and Analysis of BSP Graph Processing Models on Public Clouds

No comments:

Post a Comment