Early catch: Before ground realities become trends!

Computable Namespace Sharding

To recap the previous blog post, Sharding is the approach by which the namespace responsibilities are divided among the nodes in the cluster. This distribution can either be computable or policy-based. Computable means given the key of the object, the assigned node can be computed by any other node within the cluster or potential even the client. In contrast, policy-based sharding involves a directory look-up to determine the node responsible for a given object. Policy-based techniques typically are well-suited for Master or Multi-Masterl taxonomies, while computable sharding is the basis for Masterless taxonomy.

A good sharding algorithm is one that can equally distribute the resource usage among the nodes both in terms of incoming requests as well as storage space utilization. There is no silver bullet and depends on the data model properties, infrastructure heterogeneity, and access characteristics.

The key design patterns for computable sharding are:

Range-based

This is a simple sharding pattern and used in Memcache, Redis, BigTable, Yahoo's PNUTS, etc.
The key-space is pre-divided into ranges and assigned to individual nodes. When the node crashes, the range has to be re-assigned to another node within the cluster.
If the key is not available at the assigned node, there are no implicit replication "neighbors" that the client can check.

Consistent hashing

The key-space is derived by hashing the object name. The key-space is divided into ranges, and servers are assigned to one or more ranges. This is a common pattern used in Cassandra, Dynamo, Voldermort.
The key-spaces can be considered to form a ring, with each key-space having a successor and a predecessor. This formalism helps in defining implicit replication relationship such that when the assigned node is not available, its successor is checked.
For uniform load distribution, Cassandra used the concept of Virtual Nodes. A single node is logically divided into multiple Virtual nodes >> key-space partitions. Virtual nodes are then assigned to different key-spaces -- this ensure that the load of a single hot key-space is divided across multiple physical hosts.

P2P DHT Algorithms

The P2P research for Distributed Hash Tables (DHTs) is relevant in this space -- CAN, Chord, Pastry, Tapestry.
Algorithms such as Chord maintain a IP "finger-table" on each node with addresses on servers in different key-spaces. The goal is minimize the worst-case network re-direction from O(n) to O(log n), where n is the number of nodes.

CRUSH (OpenStack Swift)

Swift uses a combination of policy and hash functions to shard the namespace within the cluster. The cluster resources are divided into resource pools that are created for different kinds of storage (disk-based, Flash, etc.). Policies are defined for files to be explicitly specified to go on specific resource pools. The namespace distribution is computed based on policy and hash function
To generalize, namespace sharding can be made resource-aware, including awareness of network connectivity (ToR switching), fault domains, etc.

Sharding is not a one-time operation. There is constant churn in the system when nodes are being added/removed. In addition to uniform load distribution, minimizing the re-distribution churn is a key metric in the selection of sharding algorithms.

Early catch: Before ground realities become trends!

Mapping Technology Trends to Enterprise Product Innovation

Tuesday, September 17, 2013

Computable Namespace Sharding

No comments:

Post a Comment