Mapping Technology Trends to Enterprise Product Innovation

Scope: Focusses on enterprise platform software: Big Data, Cloud platforms, software-defined, micro-services, DevOps.
Why: We are living in an era of continuous change, and a low barrier to entry. Net result: Lot of noise!
What: Sharing my expertise gained over nearly two decades in the skill of extracting the signal from the noise! More precisely, identifying shifts in ground realities before they become cited trends and pain-points.
How: NOT based on reading tea leaves! Instead synthesizing technical and business understanding of the domain at 500 ft. 5000 ft., and 50K ft.

(Disclaimer: Personal views not representing my employer)

Tuesday, June 4, 2013

Three Pillars of a Scale-out Storage Architecture


Given the data deluge  customers are actively exploring scale-out solutions for their Enterprise storage architectures. Web 2.0 companies such as Google, Facebook, LinkedIn, have been forerunners of managing explosive data growth for their respective applications. Their secret sauce has been developing customized "Shared Nothing Architectures" that pool together storage from local disks on individual servers within the cluster.

The design space of Shared Nothing Storage Architectures is quite huge! There are several architecture choices that need to be made such as sharding, local persistence, metadata management, replication, fault tolerance, transaction support, consensus management, ... <you get the picture>. The existence proofs of these design choices are in the form of emerging distributed file-systems, traditional HPC file-systems, NoSQL and NewSQL systems, in-memory data grids, Cloud Storage architectures published by Google, Amazon, and others.

In my opinion, there are three core building blocks that essentially dictate the "personality" of a scale-out storage architecture at the 10,000ft level. In other words, before sinking teeth into all the fun low-level pieces of the architecture, it is important to understand these building blocks (both as an architect as well as an administrator looking to deploy these solutions) :
  1. Control Taxonomy: How the data and metadata activities are divided among the nodes of the cluster  
  2. Data Sharding: How the namespace (block/file/object/KV/document/..) is mapped across hardware resources within the cluster 
  3. Replication strategy: How is data redundancy enforced, and its impact on CRUD operations, as well as breath of supported failure scenarios.
I will deep-dive into the design choices of each of these pillars in the following posts. 

No comments:

Post a Comment