Early catch: Before ground realities become trends!

Masterless taxonomy continued...

In this post, we continue with the design patterns employed in various workflows in a Masterless scale-out storage:

The node responsible for the key-space manages the associated locks requested by the clients. The granularity of locks can be similar to byte-range, file/object level, directory-level (similar to other taxonomies)
The lock state is typically maintained in-memory, and refreshed periodically using client-side leases. If the node crashes, the clients need to request the locks again.

In a Masterless system, a transaction across multiple objects will require communication between the individual nodes responsible for the key-spaces. This is much more difficult to accomplish in a Masterless taxonomy, compared to Master-based. Typically, most scale-out systems provide transaction guarantees at the granularity of a single row or object

Given the key-based routing in Masterless systems, the replicas have to placed such that they can be computed from the primary copy i.e., replica keyID = f(primary keyID)
Dynamo's Ring-based approach copies the replica in the successor and the successor's successor in the key-space i.e., if the primary node is down, the client communicates the the node responsible for the next contiguous key-space.

Node failures are typically discovered using gossip-based techniques, instead of a centralized coordinator heartbeating the individual nodes.

As mentioned earlier, a node in the Masterless system is responsible for both the data and metadata
Replication (with quorum semantics) takes care of the data. There might be transient in-flight operations at the time of the crash i.e., space allocation for a new file/object.
Typically, the in-flight operations are logged in a WAL (Write Ahead Log) that is replicated (similar to normal data). At the time of recovery, the task of parsing the WAL records can be distributed among the nodes (will be covered in future posts).

In a Masterless system, there is no global state. Errors in the data and metadata are discovered at the time of access by the clients. As such, background daemons threads are used to constantly scrub data, load balance, create new replicas as required. Note, background daemons are also used in Master-based -- for instance load balancing can be done by the Master since it knows the load on the individual nodes and can re-distribute data-to-node assignment very easily.