Mapping Technology Trends to Enterprise Product Innovation

Scope: Focusses on enterprise platform software: Big Data, Cloud platforms, software-defined, micro-services, DevOps.
Why: We are living in an era of continuous change, and a low barrier to entry. Net result: Lot of noise!
What: Sharing my expertise gained over nearly two decades in the skill of extracting the signal from the noise! More precisely, identifying shifts in ground realities before they become cited trends and pain-points.
How: NOT based on reading tea leaves! Instead synthesizing technical and business understanding of the domain at 500 ft. 5000 ft., and 50K ft.

(Disclaimer: Personal views not representing my employer)

Friday, October 16, 2015

A new tip of the iceberg in Enterprise Storage!

Several decades ago, enterprise storage started as a naive layer in the software stack, responsible mainly for persistence and retrieval of data. The storage layer never understood what the data meant  -- the knowledge of data semantics (e.g., this directory is an Exchange mailbox) was higher up in the stack, closer to the application. We are now seeing the emergence of storage that is scratching the surface of becoming “data-aware” -- DataGravity, Cohesity, Rubrik, to name a few front-runners. Imagine the storage administrator now being able to define policies or optimize traditional workflows for backup, DR, tiering, etc., based on the “contents” of the data! The storage industry categories this segment as “Data Governance,” and is applicable both for primary or secondary/backup data. Data Governance address the business pain-point to manage data assets for security, availability, integrity, compliance, and accessibility policies. But why accomplish Data Governance at the lowest common denominator (instead of specialized application-specific tools)? Also, why am I referring to this trend as the tip of the iceberg? Read on and your questions will be answered!


Let’s level-set with a few examples on how using data-awareness addresses pain-points in existing storage workflows:
  • Let’s assume you are in an industry where regulatory compliance requires files containing a social security number to be encrypted -- today, this will require a periodic read of the changed data at the application-level with some ad-hoc rule processing, which can be a nightmare to manage and debug during errors (to say the least).
  • Assume there is a data corruption in your primary data; you need to find the backup when a specific data table got written. If the storage system indexes the data contents of a backup, this could be a breeze versus mounting every snapshot or nosing through the database transaction logs.


So, why implement Data Governance at the lowest common denominator? Especially when richer application-specific tools or data management platforms have existed for ages! There is one catch with these rich tools -- they are silo’ed. For the last decade, the mantra of enterprise storage was eliminating silos, but the rapid innovation in hardware technologies and CAP variants are forcing enterprises to adopt multiple technologies that are optimized for specific usage and data models (i.e., the end of one-size-fits-all). The specialized application-specific tools operate within specific data model silos. It's not a question of either or, but they are actually targeting different technical buyers (I hope) within the enterprise --  a storage administrator enforcing governance across all data silos versus a data scientist requiring detailed control and data analytics within a specific silo.


So why now? Could this not be done two years back? The mainstream adoption of Software-defined Storage (SDS) is the catalyst in making data analysis within storage increasingly plausible. SDS represents a new breed of scale-out storage systems that are being built ground up, often internally leveraging the micro-services architecture. Data management services are much easier to incorporate as a micro-service either in the IO path (ingestion/retrieval) or background post-processing. Further, since SDS runs on commodity building blocks that are essentially a combination of compute, storage, memory, and network, the balance can be adjusted easily to accommodate compute intensive activities (no sweat!).


So finally, why is this the tip of the iceberg? There is an inherent blurring of database and storage technologies that will increasingly make richer data processing available as the part of the storage layer. It's the convergence of a perfect storm: SDS micro-services architecture, commoditization of data analytics libraries, and the business pain-point to get a better control/ derive differentiated value from the data. It wouldn't be too far fledged to envision a marketplace of data analysis micro-services being built around 2-3 leading SDS platforms (analogous to a marketplace of applications for iOS, Andriod, etc.).

Are we there yet? In my mind, we are far from the tipping point and require some solid use-cases in multiple industry verticals demonstrating the value of building data-centric functionality at the lowest common denominator within the stack.   

No comments:

Post a Comment