Three Pillars of a Scale-out Storage Architecture
Given the data deluge customers are actively exploring scale-out solutions for their Enterprise storage architectures. Web 2.0 companies such as Google, Facebook, LinkedIn, have been forerunners of managing explosive data growth for their respective applications. Their secret sauce has been developing customized "Shared Nothing Architectures" that pool together storage from local disks on individual servers within the cluster.
The design space of Shared Nothing Storage Architectures is quite huge! There are several architecture choices that need to be made such as sharding, local persistence, metadata management, replication, fault tolerance, transaction support, consensus management, ... <you get the picture>. The existence proofs of these design choices are in the form of emerging distributed file-systems, traditional HPC file-systems, NoSQL and NewSQL systems, in-memory data grids, Cloud Storage architectures published by Google, Amazon, and others.
In my opinion, there are three core building blocks that essentially dictate the "personality" of a scale-out storage architecture at the 10,000ft level. In other words, before sinking teeth into all the fun low-level pieces of the architecture, it is important to understand these building blocks (both as an architect as well as an administrator looking to deploy these solutions) :
- Control Taxonomy: How the data and metadata activities are divided among the nodes of the cluster
- Data Sharding: How the namespace (block/file/object/KV/document/..) is mapped across hardware resources within the cluster
- Replication strategy: How is data redundancy enforced, and its impact on CRUD operations, as well as breath of supported failure scenarios.
No comments:
Post a Comment