Mapping Technology Trends to Enterprise Product Innovation

Scope: Focusses on enterprise platform software: Big Data, Cloud platforms, software-defined, micro-services, DevOps.
Why: We are living in an era of continuous change, and a low barrier to entry. Net result: Lot of noise!
What: Sharing my expertise gained over nearly two decades in the skill of extracting the signal from the noise! More precisely, identifying shifts in ground realities before they become cited trends and pain-points.
How: NOT based on reading tea leaves! Instead synthesizing technical and business understanding of the domain at 500 ft. 5000 ft., and 50K ft.

(Disclaimer: Personal views not representing my employer)

Wednesday, December 22, 2010

A Balanced CPU-Storage System: A paradigm to commercialize the scale out commodity IT model

Some of you may be familiar with the terms fabric computing, skinless servers, CMU's FAWN (Fast Array of Wimpy Nodes)  -- they all refer to the same phenomenon of creating a balanced system. Let me introduce the topic to level set.


What are Skinless Servers?
Imagine building an IT infrastructure for large-scale data analytics for Business Intelligence and data warehousing.  There are essentially two popular architecture models:
1) Light-weight Commodity Components (Scale Out):  Put together a large number of commodity servers with Direct Attached Storage (DAS). The software middleware layer is additionally required to virtualize the storage (similar to Google's Filesystem or Big Table), and schedule the tasks across the distributed system (similar to Hadoop).In short, there is no standard hardware-software bundling that is commercially available -- you need to have your own engineering teams similar to Facebook, Linkedin, and likes. 
2) Heavy-weight Commercial Hardware (Scale Up): Another approach that is likely to be followed by a Fortune 500 company is to leverage their existing IT investments in high-weight servers with a network-attached centralized storage fabric. Storage virtualization is using a commercial file-systems like IBM's GPFS, EMC's Islon, and likes. The data processing software is a commercial  BI technology such as Cognos, SAP, Greenplum, etc.  

With Skinless Servers (this is my favorite to refer to this phenomenon), there is a third option in the mix. It is similar to the commodity scale out model (Bullet #1), but a more narrow CPU-Storage performance gap. In other words, use a lower power processor (even a mobile processor) with faster storage such as a SSD.  Additionally, you can imagine these individual low power nodes to be unreliable with high failure rates.

So where is the magic?

You essentially get a "balanced system" such that the CPU is not burning cycles waiting for storage, and the storage IOPS are closer to memory speeds by being 100 times faster. This combination has been shown to reduce power requirements (2 orders of magnitude)  and a better run-time performance for certain workloads. Also, the hardware components are cheap -- redundancy and fault tolerance are built in the software. The best reference for this work is in academia -- David Anderson's group at CMU. In summary, the magic is the power savings, reduced hardware costs, and infinite elasticity that one can gain by balancing the compute, memory, network, and IO bandwidths.


Some more insights
 There is a lot of startup activity in this space. The VCs are heavily investing in this space as the potential impact can be huge (think of being acquired for $2.2 B similar 3 Par). Skinless servers can change the way public and private clouds are put together -- the promise of reduced TCO due to power savings and cheaper hardware.  Another way to look at this trend is from the perspective of the big hardware and software vendors  -- instead of the Scale Up architecture (Bullet #2), they can now commercialize the commodity Scale Out model. 

There is a long way to go -- my sense is we are at least 3-5 years away in getting to a commercially viable hardware packaging of Skinless Servers as well as the software middleware. There are several interesting questions to address, some of which may warrant a clean sheet approach:
  • Vanilla operating systems cannot handle the high IO interrupt rate of a SSD -- the TCP stack needs to re-written or significantly optimized.
  • Storage virtualization across the nodes -- how to chunk the data, moving data versus computation across the nodes, data processing parallelism, fault tolerance..Also issues like reliability, life-time, wear-leveling of SSDs
  • Specific applications for this kind of hardware -- is it essentially good for processing of  key-value stores? How about HPC workloads from media and entertainment, healthcare, and other market segments? 
  • Homogenity of hardware – is that a requirement? How to accomplish incremental hardware refresh or scale out over time 
  • ...

In closing..
While academia is solving the long-term core research problems, there are several intermediate milestones in the development evolution that can be carved out as commercial offerings to address specialized workloads or specific market segments. Nevertheless it would be interesting to watch this space!

No comments:

Post a Comment