Early catch: Before ground realities become trends!

Control Taxonomy: Master, Masterless, and Multi-Master

Control Taxonomy defines the approach by which the individual nodes within a scale-out cluster coordinate to manage resources, handle events, and support storage data and metadata operations for the clients. Broadly speaking, there are three different types of Control Taxonomy that are prevalent in scale-out storage and data management systems:

Master Taxonomy: In this taxonomy, there is a single node (called Master) within the cluster with special responsibilities for managing the cluster state, and coordinating activities/tasks (described later). HDFS is an example of this taxonomy.
Masterless Taxonomy: All the nodes within the cluster share equal responsibilities for managing the cluster state and activities. This taxonomy became popular with Amazon's Dynamo paper, and currently used in systems such as OpenStack Swift, Cassandra.
Multi-Master Taxonomy: This is the evolution of the single Master taxonomy -- the global Master acts as a light-weight coordinator, and divides the cluster management tasks among the among the nodes within the cluster. This is in contrast to the Masterless taxonomy, where there is no single global state maintained by the entire cluster. Ceph, GPFS, Bigtable, are examples of this taxonomy.

In the rest of this blog, we will focus on defining two key aspects: cluster state and cluster activities/tasks. We will use these details to distinguish between the taxonomies listed above.

"State" is a generic term: for this discussion, it refers to a combination of:

Cluster configuration: Servers and disks that are part of the cluster, and their hardware configuration, membership state in the forms heartbeats, connectivity, etc.
Namespace metadata: Describes what namespace is being served, and how the namespace maps to current server and disk resources. Typically this metadata is stored as a combination of on-disk persisted records, as well as dynamically maintained by scanning.
Operation-specific metadata: Examples include locks, leases, caches, batching/group commit, re-build/repair operations, etc.

"Tasks" in a scale-out storage can be divided into following categories

Cluster Management tasks: Tracking node membership,network partitions, node recovery, etc.
Metadata management tasks: Maintaining an up-to-date mapping of namespace to resources, space management, cache invalidation, locking, load balancing, data integrity checks, garbage collection, etc.

Early catch: Before ground realities become trends!

Mapping Technology Trends to Enterprise Product Innovation

Tuesday, August 13, 2013

Control Taxonomy: Master, Masterless, and Multi-Master

No comments:

Post a Comment