I am starting a series to post to define the taxonomy of a scale-out distributed storage system. These systems were originally applied at scale in Web 2.0 companies namely Google, Facebook, LinkedIn, etc., and now are getting mainstream enterprise adoption in the form of Big Data Solutions, Cloud Provider backend, Appliances (VMW's EVO:RAIL, Nutanix, etc.), scale-out volume managers (VMW's vSAN), distributed filesystems (Isilon's OneFS, IBM's GPFS SNC, etc.).
A distributed data management system exposes interfaces such
as blocks (volume managers), files (filesystems), tables, KV, tuples, etc. The
interface typically supports standard data and metadata operations to create,
read, write, delete, list. Under the covers, these systems implement multiple
services to provide consistency, availability, fault tolerance, durability,
performance, manageability. These services address the hard distributed
problems for dealing with asynchrony, partial failures, unbounded component
latencies. These services can be viewed as “micro services” (in cloud
terminology), and implemented as stand-alone, stateless daemons, or in-line
during read/write operations.
While there are different ways to slice-and-dice the
internals of a distributed data management system, we propose an overlay
taxonomy where the basic functionality is supported, and advanced capabilities
can be added incrementally as overlays.
We divide into seven categories:
- Core Operations (Part I): This represents the bare-bones services required for a distributed data management framework. In other words, it supports the interfaces for basic persistence and retrieval of data within a shared nothing scale-out environment. The key services are:
- Cluster Management
- Namespace Metadata Management
- Data Persistence
- Durability (Part II): This category of services is responsible to ensure near zero data loss despite hardware and component failures. Additionally, the reliable persistence needs to be accomplished with reasonable latency. The key services are:
- Replica Placement
- Replication/Redundancy Orchestration
- Replica repair
- Data Integrity/Scrubbing
- Geo-redundancy service
- Consistency (Part III): Broadly, this category of services ensures an understandable read-write coherence and concurrency model for the application developers A different flavor of consistency is crash consistency, where the system recovers from component failures. Additionally, functionality such as multi-object transactions falls in this category. The key services are:
- Read-Write Serialization (Single node)
- Write-Write Serialization (Single node)
- Serialization of writes across replicas
- Serialization of reads across replicas
- HA/FT (Part IV): These services ensure that the system is always available to serve requests despite failures – this essentially translates to having enough redundancy, fault tolerance, and repair/failback capabilities. The key services are:
- Data consistency during failures
- Failure detection
- Failover orchestration
- Repair and failback
- Elasticity (Part V): Self-balancing w.r.t. addition and removal of nodes, hot-spots and ability for dynamic traffic shaping across services and hardware resources. Key services include:
- Re-sharding
- Re-mapping
- Auto-scaling
- Manageability: Broad category for ease of deployment, upgrade, maintenance, monitoring, policy-based differentiated QoS, security configuration.
- Upgrade/maintenance
- Authentication/Authorization/Multi-tenancy
- Policies for placement/Differentiated QoS
- Monitoring/Diagnosis/Alerts/Events/Stats
- Enterprise Data Services: This category includes capabilities similar to enterprise storage controllers namely snapshots, backup, tiering (including the cloud), dedup, compression, encryption. This bucket also includes legacy support including ACL support, rename support, POSIX hard and soft links.
- Snapshot/Clone
- Dedup/compression
- Access Control/Security/
- Encryption (at rest/in motion)
- POSIX Rename/Hard/Soft link
No comments:
Post a Comment