Mapping Technology Trends to Enterprise Product Innovation

Scope: Focusses on enterprise platform software: Big Data, Cloud platforms, software-defined, micro-services, DevOps.
Why: We are living in an era of continuous change, and a low barrier to entry. Net result: Lot of noise!
What: Sharing my expertise gained over nearly two decades in the skill of extracting the signal from the noise! More precisely, identifying shifts in ground realities before they become cited trends and pain-points.
How: NOT based on reading tea leaves! Instead synthesizing technical and business understanding of the domain at 500 ft. 5000 ft., and 50K ft.

(Disclaimer: Personal views not representing my employer)

Wednesday, November 12, 2014

Part 0: Nuts-and-bolts of a Scale-out Distributed Storage System

I am starting a series to post to define the taxonomy of a scale-out distributed storage system. These systems were originally applied at scale in Web 2.0 companies namely Google, Facebook, LinkedIn, etc., and now are getting mainstream enterprise adoption in the form of Big Data Solutions, Cloud Provider backend, Appliances (VMW's EVO:RAIL, Nutanix, etc.), scale-out volume managers (VMW's vSAN), distributed filesystems (Isilon's OneFS, IBM's GPFS SNC, etc.).

A distributed data management system exposes interfaces such as blocks (volume managers), files (filesystems), tables, KV, tuples, etc. The interface typically supports standard data and metadata operations to create, read, write, delete, list. Under the covers, these systems implement multiple services to provide consistency, availability, fault tolerance, durability, performance, manageability. These services address the hard distributed problems for dealing with asynchrony, partial failures, unbounded component latencies. These services can be viewed as “micro services” (in cloud terminology), and implemented as stand-alone, stateless daemons, or in-line during read/write operations.

While there are different ways to slice-and-dice the internals of a distributed data management system, we propose an overlay taxonomy where the basic functionality is supported, and advanced capabilities can be added incrementally as overlays.  We divide into seven categories:

  • Core Operations (Part I): This represents the bare-bones services required for a distributed data management framework. In other words, it supports the interfaces for basic persistence and retrieval of data within a shared nothing scale-out environment. The key services are:
    • Cluster Management
    • Namespace Metadata Management
    • Data Persistence
  • Durability (Part II): This category of services is responsible to ensure near zero data loss despite hardware and component failures. Additionally, the reliable persistence needs to be accomplished with reasonable latency. The key services are:
    • Replica Placement
    • Replication/Redundancy Orchestration
    • Replica repair
    • Data Integrity/Scrubbing
    • Geo-redundancy service
  • Consistency (Part III): Broadly, this category of services ensures an understandable read-write coherence and concurrency model for the application developers A different flavor of consistency is crash consistency, where the system recovers from component failures. Additionally, functionality such as multi-object transactions falls in this category. The key services are:
    • Read-Write Serialization (Single node)
    • Write-Write Serialization (Single node)
    • Serialization of writes across replicas
    • Serialization of reads across replicas
  • HA/FT (Part IV): These services ensure that the system is always available to serve requests despite failures – this essentially translates to having enough redundancy, fault tolerance, and repair/failback capabilities. The key services are:
    • Data consistency during failures
    • Failure detection
    • Failover orchestration
    • Repair and failback
  • Elasticity (Part V): Self-balancing w.r.t. addition and removal of nodes, hot-spots and ability for dynamic traffic shaping across services and hardware resources.  Key services include:
    • Re-sharding
    • Re-mapping
    • Auto-scaling
  • Manageability: Broad category for ease of deployment, upgrade, maintenance, monitoring, policy-based differentiated QoS, security configuration.  
    • Upgrade/maintenance
    • Authentication/Authorization/Multi-tenancy
    • Policies for placement/Differentiated QoS
    • Monitoring/Diagnosis/Alerts/Events/Stats 
  • Enterprise Data Services: This category includes capabilities similar to enterprise storage controllers namely snapshots, backup, tiering (including the cloud), dedup, compression, encryption. This bucket also includes legacy support including ACL support, rename support, POSIX hard and soft links.

    • Snapshot/Clone
    • Dedup/compression
    • Access Control/Security/
    • Encryption (at rest/in motion)
    • POSIX Rename/Hard/Soft link

No comments:

Post a Comment