Relationship of Sharding and Replication
Sharding deals with the distribution of namespace responsibilities among the cluster nodes. Replication on the other hand ensures high availability of the namespace by creating copies of the data. Additionally, replication helps in load balancing by typically serving the read traffic. The two aspects are considered together during designing the system -- the placement of the replica is correlated with the key-space sharding such that the client can computes the replica location as a function of the key-space. Such correlation primarily exists in Key-based routing schemes, and to a lesser extent in directory-based look-up schemes where the replica location can be arbitrary.
Consider a concrete example of a ring-style consistent hashing sharding in Amazon Dynamo or Openstack Swift. The key-space is partitioned across the cluster nodes that are each assigned to one or more key ranges. The replication scheme assigns replica placement to the neighbor and the neighbor's neighbor in the ring.
Thus the implication of the correlation is an additional constraint of considering fault domains during key-space sharding. There is also a growing interest to consider network constraints among the sharded nodes to minimize latency overheads due to replication.
No comments:
Post a Comment