This post covers Data Durability.
Data Durability ensures minimal data loss in the event of hardware failures,
component corruption, software bugs. The most common approach for data
durability is to create multiple copies of data. Both of them have pros and
cons. Replication imposes overheads w.r.t. space usage (e.g., 3X the capacity),
but is cheaper w.r.t. partial update overheads and the amount of data required
to be read during recovery. Erasure coding across nodes has the inverse
pros/cons compared to replication.
With the adoption of All-Flash environments, erasure coding
is getting lot of attention in recent research.
The key building services are:
• Replica
Placement: Replica placement needs to take into account:
– Fault
domain-awareness for namespace and replica distribution
– Replica
Server Allocation: Deciding the replica servers for namespace
• Replication
Orchestration: Deals with the actual mechanism for the actual data replication
process. There are overlapping aspects with data consistency
– Read-write
protocol for replicas
– Coordination
(serialization and ordering) of updates to replicas
– State
versus operation-based replication
• Replica
repair: While a writes are committed across a quorum of replicas, a replica can
get out-of-sync and needs repair under the following scenarios:
– Offline
replica connects back
– Conflict
in replica updates especially in AP systems (i.e., any replica update model without
quorum consensus).
• Data
Integrity/Scrubbing: This involves storing checksums and accessing the disk
blocks in the background thread to guarantee data correctness.
• Geo-redundancy
service: Replication across sites. The aspects are similar to replication with
data center, with the additional aspect of network optimization techniques.
No comments:
Post a Comment