Early catch: Before ground realities become trends!: Part II: Nuts-and-bolts of a Scale-out Distributed Storage System

This post covers Data Durability.

Data Durability ensures minimal data loss in the event of hardware failures, component corruption, software bugs. The most common approach for data durability is to create multiple copies of data. Both of them have pros and cons. Replication imposes overheads w.r.t. space usage (e.g., 3X the capacity), but is cheaper w.r.t. partial update overheads and the amount of data required to be read during recovery. Erasure coding across nodes has the inverse pros/cons compared to replication.

With the adoption of All-Flash environments, erasure coding is getting lot of attention in recent research.

The key building services are:

• Replica Placement: Replica placement needs to take into account:

– Fault domain-awareness for namespace and replica distribution

– Replica Server Allocation: Deciding the replica servers for namespace

• Replication Orchestration: Deals with the actual mechanism for the actual data replication process. There are overlapping aspects with data consistency

– Read-write protocol for replicas

– Coordination (serialization and ordering) of updates to replicas

– State versus operation-based replication

• Replica repair: While a writes are committed across a quorum of replicas, a replica can get out-of-sync and needs repair under the following scenarios:

– Offline replica connects back

– Conflict in replica updates especially in AP systems (i.e., any replica update model without quorum consensus).

• Data Integrity/Scrubbing: This involves storing checksums and accessing the disk blocks in the background thread to guarantee data correctness.

• Geo-redundancy service: Replication across sites. The aspects are similar to replication with data center, with the additional aspect of network optimization techniques.

Early catch: Before ground realities become trends!

Mapping Technology Trends to Enterprise Product Innovation

Wednesday, November 12, 2014

Part II: Nuts-and-bolts of a Scale-out Distributed Storage System

No comments:

Post a Comment