How is data persisted within a Scale-out Storage System?
In a Shared Nothing Scale-out storage system, data is distributed across multiple nodes. Each node persists the data on the physical disk/Flash resources. There are different design patterns available for accomplishing this persistence:
- Local Filesystem (Vanilla): This is the most common pattern -- the data on each node is tracked as files in a local filesystem such as ext3, ext4, xFS, VMFS. HDFS/GFS, Lustre, PVFS, Azure Storage, and some of the systems that use this pattern. Typically, there is no 1:1 mapping between the scale-out namespace and the local file. This is because objects are striped across nodes -- for instance, in HDFS, a file is striped across nodes in 64MB chunks, such that each chunk is persisted as a file in ext3.
- Local Filesystem (Record-oriented): In the vanilla model, the local file stores only the actual data. Typically, a simple file-per-stripe is not optimal for most enterprise workloads. Instead, the data is tracked as a collection of records within a small number of large files.
- Log Structured: The individual updates of objects are persisted as records within a file, combined with some form of b+-tree or hashtable to track the associated records. Log structured allows high performance for disk-based storage, by converting random writes into into sequential.
- SSTables (Stored String Tables): In Cassandra, the file data is stored as immutable SSTables
- Multiple stripes-per-file: In PVFS, in contrast to HDFS's stripe-per-file, all the stripes associated with a specific file/object are persisted as a single local file.
- Physical disks: IBM's GPFS, Microsoft's Flat Datacenter Storage, etc. allocate and track data on physical disks. This provides full-control on the layout of the data within the disk, and removes a layer of in-direction in the IO path. Conversely, logical volumes can also be used in a similar fashion instead of physical disks.
No comments:
Post a Comment