Thursday, November 7, 2013

Signal/Collect

Signal/Collect is a framework for synchronous and asynchronous parallel graph processing. It differs from Pregel for two main reasons: (1) the edges can have computations associated to them as well in the signal function, so you basically write a compute() method for the vertices and another one for the edges (2) the synchronization barrier constraints are relaxed, so it's possible also to implement async algorithms.

Asynchronous computation is done by introducing randomization on the set of vertices on which signal and collect computations have to be computed.

The main problem is, at the time, the core system doesn't scale to multiple machine. It uses shared memory model on a single powerful machine with a huge memory. It might be very inefficient or hard-to-use in a distributed setting.

It keeps a list which contains the latest known values of neighbors. In addition it has an incoming message queue for messages that are not read yet. This neighbor list is useful in Giraphx but it might be memory inefficient.

It supports prioritization of vertices or operations. The authors introduce a threshold score which is used to decide whether a node should collect its signals or it should send signals. Using this score, processing of algorithms can be accelerated in a way that for every superstep only signals and collects are performed if a certain threshold is hit.


It also provides a lot of small extensions which might give hints in Maestro implementation. It does not enable edge/vertex add/removal.

No comments:

Post a Comment