Mapping Technology Trends to Enterprise Product Innovation

Scope: Focusses on enterprise platform software: Big Data, Cloud platforms, software-defined, micro-services, DevOps.
Why: We are living in an era of continuous change, and a low barrier to entry. Net result: Lot of noise!
What: Sharing my expertise gained over nearly two decades in the skill of extracting the signal from the noise! More precisely, identifying shifts in ground realities before they become cited trends and pain-points.
How: NOT based on reading tea leaves! Instead synthesizing technical and business understanding of the domain at 500 ft. 5000 ft., and 50K ft.

(Disclaimer: Personal views not representing my employer)

Wednesday, November 12, 2014

Part III: Nuts-and-bolts of a Scale-out Distributed Storage System

This post covers Data Consistency.

Broadly speaking, Data Consistency represents an agreement between the application developer and the persistence layer. The purpose of this post is to map the observable behavior of consistency to the underlying building blocks/services.

 The granularity of guarantees can be on a per-object basis or at a broader namespace granularity e.g., objects within the same container/directory. Our focus in this post is on independent updates on objects. We deal with transactions as a separate topic, which consists of dependent updates on objects.

There are different models for consistency. The POSIX model requires that reads reflect any data previously written, and that writes are atomic (i. e. the result of overlapping, concurrent writes will reflect a particular order of occurrence).  In an eventually consistent system, the applications can relax the POSIX constraints and can settle for read-your-writes or monotonic reads.

The key building blocks are:
  • Read-Write Serialization (Single node): This is a single node perspective for read-write operations on the same object (also referred to as coherence model). This is analogous to memory coherence models that aim to define the order in which updates will be visible during read-write operations to a given register – there is a vast body of work defining coherence models such as MESI, Java Memory model, etc. A commonly used model in storage system is the taxonomy defined by Lamport (Safe, Regular, and Atomic). POSIX defines read-write exclusion, and roughly translates to Atomic in Lamport’s taxonomy. 
  • Write-Write Serialization (Single node): Defines how concurrent update to the same object are handled. The easiest model from an implementation standpoint is LWW (Last Writer Wins), but is the most non-deterministic from the application developer’s perspective, given the asynchrony in message delivery.
  • Serialization of writes across replicas: This captures how the replicas reflect the updates:
    • Scheduling of writes to replicas
    • Ordering of writes to replica (including byte order fidelity in the replica copies)
    • Handling of partial failures during replica update
  • Serialization of reads across replicas:
    • Read access protocol across replicas
    • Read-write exclusion guarantee on replica 




No comments:

Post a Comment