Mapping Technology Trends to Enterprise Product Innovation

Scope: Focusses on enterprise platform software: Big Data, Cloud platforms, software-defined, micro-services, DevOps.
Why: We are living in an era of continuous change, and a low barrier to entry. Net result: Lot of noise!
What: Sharing my expertise gained over nearly two decades in the skill of extracting the signal from the noise! More precisely, identifying shifts in ground realities before they become cited trends and pain-points.
How: NOT based on reading tea leaves! Instead synthesizing technical and business understanding of the domain at 500 ft. 5000 ft., and 50K ft.

(Disclaimer: Personal views not representing my employer)

Friday, September 11, 2015

6 key hardware trends in the context of Software-defined Storage and Big Data Platforms

Often times, I get in discussions where fellow engineers/architects are solely fixated by properties of the emerging storage hierarchy across disks, flash, and NVM. We are in an era where there is a significant diversity of storage devices in terms of $/GB, $/IOPS, latency, capacity, throughput, and write endurance. While storage is an important part of the overall solution puzzle, it is easy to forgot that it's still just a “part” and not the whole puzzle. In going from disk to flash to NVM, the IO latencies have shifted from milliseconds to microseconds to nanoseconds respectively.  Does this mean that we can simply add NVM or SLC flash to storage arrays, and completely harness the radically improved service times? Of course not. To generalize the case in point, the design of Software-defined Storage and Big Data Platforms need to be vetted with a holistic analysis across the core building blocks i.e., CPU, memory, network, and storage. The goal this blog post is to highlight relevant trends that engineers, system administrators, CIOs should keep in perspective while developing or adopting next generation data platform solutions.

  1. CPU Scaling: Slowdown of Moore’s law: Few camps believe that Moore’s law is already dead, while others believe in the imminent death. Irrespective, the key point is that we are no longer able to double compute every 18 months. Instead, we have settled with multi-core, which from the software standpoint provides linear (30-40%) scaling per year, given the limited ability of truly exploit parallel programming models. With data growing at an exponential rate, the key takeaway is the reducing availability of cycles per unit of data.
  2. Memory gets faster and cheaper: In the beginning of the decade, volatile memory within servers was a scarce resource (I remember spending endless hours on metadata optimizations to avoid paging overheads). The maturity of fabrication technologies have turned the dial w.r.t. average size and $/GB. Standard server configurations are available today with 256GB of memory, comparing to few years back where 8-16 GB was the norm. 
  3. Latency lags Bandwidth: This is an observation from David Patterson in 2004 based on his details analysis across CPU, network, memory, and storage. To illustrate, from mid-1980s to 2009, the disk bandwidth increased from 2GB/s to 100GB/sec (a 100X improvement), but latencies only improved from 20ms to 10 ms (2X). Discontinuous innovation in storage technologies with NVM are creating an outlier for storage, but the bandwidths can potentially have a similar jump with the emergence of NVMe Fabric effort. Overall, Latency lags Bandwidth will continue to be true in the long run.  
  4. Distinct Performance and Capacity tiers: The diversity in storage technologies are creating distinct tiers of storage w.r.t. $/IOPS and $/GB. Rotating disks will continue to be the leaders w.r.t. $/GB, but are expensive on a $/IOPS basis compared to flash and NVM. The distinction clearly lends itself to innovation across caching and tiering technologies.

  5. Network is the new bottleneck: With disks at 6-8msec, latency optimization in the remaining IO stack was inconsequential. With NVM at few hundred nanoseconds, every optimization, mainly the network round trip is now critical. In fact, the best case network round trip of 1-2 microsecond is now the bottleneck in the overall NVM-based IO path.
     
  6. Closing of the latency gap: In the disk era, the latency to access data on disk was order of milliseconds. There was multiple orders of magnitude latency difference a cache/memory hit (order of nanoseconds) versus access from disks (order of milliseconds). This was referred to as the “latency gap.” With the adoption of NAND Flash within enterprises, the latency of a miss was 1000X faster (order of microseconds). With the upcoming byte-addressable NVM, another 1000X speedup (order of nanoseconds) is now imminent, essentially closing the “latency gap” from a technology physics standpoint! A standard server configuration will be available in different configuration mix of NVM and DDR RAM on the memory controller, Flash on the PCIe bus, and disks on the SAS/SATA controller.


To summarize, remember these trends next time you have a discussion on Enterprise storage and Big Data Platforms -- try to analyze how well the solutions are future-proof in the context of the least common denominator i.e., the hardware trends across CPU, Memory, Network, and Storage.

No comments:

Post a Comment