Early catch: Before ground realities become trends!

Masterless Taxonomy for Scale-out Storage

In the previous posts, we covered the Master-based taxonomy. The other extreme of the spectrum is Masterless taxonomy -- as the name suggests, there is no single (master) node that controls the entire cluster. In this design pattern, all the nodes are symmetric and they divide the responsibility for both data and metadata operations. The responsibility is divided by sharding the namespace among the nodes. A few examples of this taxonomy are Cassandra, Amazon's Dynamo and related projects just as LinkedIn's Voldemort, Openstack Swift.

The key reason that makes Masterless taxonomy attractive is the cluster scalability. In contrast to Master-based, there is no single point of saturation. The global cluster state is completely distributed among the nodes. Clients connect to the nodes using Key-based Routing techniques (we will cover this later) that mathematically compute the node responsible for a particular namespace (in contrast to a directory look-up). In the context of CAP theorem, masterless systems loosely fall in the AP bucket i.e., compromising consistency for availability.

As mentioned earlier, the namespace is shared among the nodes within the cluster. Consistent hashing is the most popular approach where the namespace is hashed in an n-bit address space (0-2^n) -- the nodes are assigned regions within this key-space. The clients can contact any node within the cluster, which then computes the hash value to locate the server responsible for the key-space. Consistent hashing ensures minimum amount of key-space re-distribution in the event of node failures. Several P2P techniques such as Chord, CAN, Pastry can be used. Openstack Swift uses another variant called the CRUSH protocol (which also combines policies). The simplest form of distribution is based on modulo function i.e., key-space modulo num_nodes to find the responsible node. This distribution is sensitive to the number of nodes, which is constantly changing within large scale-out systems.

Lets look at the some of the key workflows:

Read workflow:

As described earlier, the location of the node is computed. The computation can be done by the client (requiring some client code to be installed), or more commonly by contacting any node within the cluster.

Writes/Space Allocation:

Same as reads. The node assigned the key-space is responsible for all the metadata and resource management. It manages space management of the local node resources.
It contacts its neighbors to create replicas of the data. For a given node, the location of its replicas can also be compute based on a ring-based protocol such as Chord.
All locks/reservations are maintained by this node.

...<to be continued>...

Early catch: Before ground realities become trends!

Mapping Technology Trends to Enterprise Product Innovation

Friday, August 30, 2013

Masterless Taxonomy for Scale-out Storage

No comments:

Post a Comment