Mapping Technology Trends to Enterprise Product Innovation

Scope: Focusses on enterprise platform software: Big Data, Cloud platforms, software-defined, micro-services, DevOps.
Why: We are living in an era of continuous change, and a low barrier to entry. Net result: Lot of noise!
What: Sharing my expertise gained over nearly two decades in the skill of extracting the signal from the noise! More precisely, identifying shifts in ground realities before they become cited trends and pain-points.
How: NOT based on reading tea leaves! Instead synthesizing technical and business understanding of the domain at 500 ft. 5000 ft., and 50K ft.

(Disclaimer: Personal views not representing my employer)

Tuesday, September 2, 2014

Append-only Storage Layout

Append-only is an approach made popular by Google File System (GFS),  and also adopted in HDFS. The basic approach is to allow updates only in the form of appends to the existing file i.e., the previously written file data is immutable and cannot be modified. The model does allow files to be deleted as a whole. The file is typically treated as a collection of blocks/chunks, such that the last block is immutable.

This approach belongs to the family of protocols that provide out-of-place updates i.e., new data is written to a different logical address. Other protocols in this family is WAFL (Write Anywhere File Layout), Log Structured Merge (LSM) Trees, COW, version-based storage. In comparing append-only to WAFL, the key difference is that WAFL updates the file metadata to point to the updated data, while Append-only expects the application to determine the latest version of the block based on some sort of generation number.

A few key characteristics of an append-only model are:
  • There is no byte fidelity across replicas of the file. The system enforces at least once semantics i.e., there will at least one copy of the update in all the replicas, but there can be more due to retries.
  • The API does not specify the EoF offset. Instead, it is chosen dynamically by the chunk servers. This allows multiple writers to concurrently update the file (as long as the update is an  atomic operation, there is no need for locking). 
Append-only is an attractive model since it minimizes the need for synchronization between readers, readers and writers, and between writers:

  • Readers: Multiple readers can read the data
  • Reader-Writer: Writers can append to the data, while readers continue to read the old data. For this model to succeed, the updates need to be atomic (to avoid reading partial data).
  • Writer-writer: Since the writers don`t specific the absolute address, the filesystem can serialize the updates in any order (as long the replicas are all updated with the same ordering).


No comments:

Post a Comment