Download Riak 2.0

Vector Clocks

One of Riak's central goals is high availability. It was built as a multi-node system in which any node is capable of receiving requests without requiring that each node participate in each request. In a system like this, it's important to be able to keep track of which version of a value is the most current. This is where vector clocks come in.

Vector Clocks and Relationships Between Objects

All Riak objects are stored in a location defined by the object's bucket and key, as well as by the bucket type defining the bucket's properties. It is possible to configure Riak to ensure that only one copy of an object ever exists in a specific location. This will ensure that at most one object is returned when a read is performed on a bucket type/bucket/key location (and no objects if Riak returns not found).

If Riak is configured this way, Riak may still make use of vector clocks behind the scenes to make intelligent decisions about which replica of an object should be deemed the most recent, but in that case vector clocks will be a non-issue for clients connecting to Riak.

Siblings

It is also possible to configure Riak to store multiple objects in a single key, i.e. for an object to have different values on different nodes. Objects stored this way are called siblings. You can instruct Riak to allow for sibling creation by setting the the allow_mult bucket property to false for a specific bucket, preferably using bucket types.

This is where vector clocks come in. Vector clocks are metadata attached to all Riak objects that enable Riak to determine the causal relationship between two two objects. Vector clocks are non-human- readable and look something like this:

a85hYGBgzGDKBVIcR4M2cgczH7HPYEpkzGNlsP/VfYYvCwA=

A number of important aspects of the relationship between object replicas can be determined using vector clocks:

Behind the scenes, Riak uses vector clocks as an essential element of its active anti-entropy subsystem and of its automatic read repair capabilities.

From the standpoint of application development, the difficulty with siblings is that they by definition conflict with one another. When an application attempts to read an object that has siblings, multiple replicas will be stored in the location where the application is looking. This means that the application will need to develop a strategy for conflict resolution.

More Information

Additional information on vector clocks: