In this post, we cover Core Services.
These represent the bare-bones services for a distributed
management framework.
·
Cluster Management: Responsible for tracking the
state and membership of the nodes within the cluster. It involves the following
aspects:
- Tracking cluster membership: This could be centralized or distributed, and responsible to track the state (up, failed, partitioned) and membership of nodes within the cluster.
- Role Selection process: Depending on the cluster taxonomy (symmetric versus asymmetric), the service needs to decide how the data and metadata coordination tasks are distributed among the nodes. The service could trigger a re-assignment of namespace or election of a new master/coordinator using a consensus protocol among the cluster components.
- Communication middleware between cluster components: This is the middleware by which components communicate within the cluster. Ideally, the middleware should provide in-order, guaranteed delivery of messages.
·
Namespace Metadata Management: Responsible for
tracking the namespace exported to applications, and the persistence and
consistency of the metadata.
• Sharding:
Dividing the namespace serving responsibility among the nodes.
• Deciding
the Namespace-to-Resource Mapping
• Object
lookup strategies: How the client accesses the object
• Handling
heterogeneous HW: Accounting for different hardware capabilities
• Metadata
persistence and durability: Division of responsibility among cluster components
in maintaining metadata: a) Metadata structures for tracking (superblock,
inode, dentry, fd); b) Free space tracking and allocation (Ensuring space
allocation in parallel, Book-keeping of free-space); C) Operation log structure.
This also includes metadata durability i.e., how metadata is replicated within
the system.
• Metadata
serialization: Deals with coordination of metadata updates across multiple concurrent
updates.
• Metadata
Crash Recovery: Ensuring consistent state of metadata after a crash: a) Transactional
updates across md components; b) Rollback for failed updates.
·
Data Persistence: Deals with making sure that
committed data is reliably persisted, with reasonable parallelism and latency
(given that disks have traditionally been the slowest component in the IO
path).
o
Write
Buffering: Deferring updates to persistent media – essentially equivalent
to group commit semantics in databases. In the era of hard-disk latencies,
buffering was critical for application performance. Buffering needs to be compliant with coherence
semantics for read-write operations w.r.t. reflecting the updates.
o
Caching:
Maintaining commonly accessed data in memory to reduce traffic to the persistent
storage. Caching typically refers to reads, while buffering refers to writes.
The aspects involved in the caching design are:
§
What: Data versus Metadata layout
§
Where: Clients, metadata nodes, Data nodes
§
When: Prefetching, on reads, on writes, -access,..
§
How:
·
Algorithms to cache the most active data;
·
In-memory layout
§
Invalidation protocol:
·
Handling Data updates
·
Handling Metadata changes (ACL, resource map,..)
§
Multi data-center caching coordination
§
Bolt-on caching models: Memcache, INDG,..
o
Persistence
Storage Writer: This service is responsible for writing data to disk. There
are multiple aspects involved:
§
Layout on
disk: This involves the following:
·
On-disk format
·
Stripping data across disks and nodes
·
Tiering of data across memory/flash/disks
·
Resource allocation across resources via LVM or
local FS
§
Data
mutability model: Some of the aspects involved are: a) how new writes are
persisted; b) tracking updates to existing data; c) tracking deletes in
immutable storage
o
Garbage
collection: For out-of-place updates, reclaiming the space for the original
blocks
No comments:
Post a Comment