I was reading this article on the HP blog about the future of Software Defined Storage and how the VSA fits perfectly. Although I agree that a VSA (virtual storage appliance) could potentially be a Software Defined Storage solution I do not really agree with IDC quote used for the basis of this article and on top of that I think some crucial factors are left out. Lets start with the IDC quote:
IDC defines software-based storage as any storage software stack that can be installed on any commodity resources (x86 hardware, hypervisors, or cloud) and/or off-the-shelf computing hardware and used to offer a full suite of storage services and federation between the underlying persistent data placement resources to enable data mobility of its tenants between these resources.
Software Defined Storage solutions to me are not necessarily just a software-based storage product. Yes as said a VSA, or something like Virtual SAN (VSAN), could be part of your strategy but how about the storage you have today? Do we really expect customers to forget about their “legacy” storage and just write it off? Surely that won’t happen, especially not in this economical climate and considering many companies invested heavily in storage when they started virtualizing production workloads. What is missing in this quote, or in that article (although briefly mentioned in linked article), is the whole concept of “abstract, pool, automate”. I guess some of you will say, well that is VMware’s motto right? Well yes and no. Yes, “abstract, pool, automate” is the way of the future if you ask VMware. However this is not something new. Think about Software Defined Networking for instance, this is fully based on the “abstract, pool, automate” concept.
This had me thinking, what is missing today? There are various different initiatives around networking (openflow etc), but what about storage? I created this diagram that from a logical perspective explains what I think we need when it comes to Software Defined Storage. I guess this is what Ray Lucchesi is referring to in his article on Openflow and the storage world. Brent Compton from FusionIO also had an insightful article on this topic, worth reading.
If you look at my diagram… (yes FCoE/Infiniband/DAS etc is missing, not because it shouldn’t be supported but just to simplify the diagram) I drew a hypervisor at the top, reason for it being is that I have been in the hypervisor business for years but reality is this could be anything right. From a hypervisor perspective all you should see is a pool. A pool for your IO, a pool where you can store your data. Now this layer should provide you various things. Lets start at the bottom and work our way up.
- IO Connect = Abstract / Pool logic. This should allow you to connect to various types of storage abstract it for your consumers and pool it of course
- API = Do I need to explain this? It addresses the “automate” part but also probably even more importantly the integration aspect. Integration is key for a Software Defined Datacenter. The API(s) should be able to provide both north-, south-, east- and west-bound capabilities (for explanation around this read this article, although it is about Software Defined Networking it should get the point across)
- PE = Policy Engine, takes care of making sure your data ends up on the right tier with the right set of services
- DS = Data Services. Although the storage can provide specific data services this is also an opportunity for the layer in between the hypervisor and the storage. Matter a fact, data services should be possible on all layers: physical storage, appliance, host based etc
- $$ = Caching. I drew this out separately for a reason, yes it could be seen as a data service but I wanted it separately as for any layer inserted there is an impact on performance. An elegant and efficient caching solution at the right level could mitigate the impact. Again, caching could be part of this framework but could very well sit outside of it on the host-side or at the storage layer, or appliance based
One thing I want to emphasize here is the importance of the API. I briefly mentioned enabling north-, south-, east- and west-bound capabilities but in order for a solution like this to be successful this is a must. Although with automation you can go a long way, integration is key here! Whether it is seamless integration with your physical systems, integration with your virtual management solution or with an external cloud storage solution… These APIs should be able to provide that kind of functionality and be enable a true pluggable framework experience.
If you look at this approach, and I drew this out before I even looked at Virsto, it kind of resembles what Virsto offers today. Although there are components missing the concept is similar. It also resembles VVOLs in a way, which was discussed at VMworld in 2011 and 2012. I would say that what I described is a combination of both combined with what Software Defined Networking promises.
So where am I going with this? Good question, honestly I don’t know… For me articles like these are a nice way of blowing steam, get the creative juices going and open up the conversation. I do feel the world is ready for the next step from a Software Defined Storage perspective, I guess the really question is who is going to take this next step and when? I would love to hear your feedback.
Francesco Mazzotta says
I agree with you in relation to “Now” whilst company are still looking at adopting the best hardware/software combinations possible in terms of ROI.
As for the “Future” I can see how manufacturers are putting more and more emphasis on Software based solutions from SDDC to VDI to Cloud Portals to Replication and so on….however users should not forget that although you can play around with software and make it as efficient and performing as possible, you cant run a Formula1 race with a Fiat Cinquecento – no matter how good your electronics are.
Jason Boche says
As always, thanks for sharing your thoughts Duncan. Although we’re only at the beginning stages of the Software Defined movement, I’ve been waiting for someone to step forward with a responsible customer perspective and acknowledge the legacy storage vendors in the room which happen to occupy a good share of floor space in datacenters of nearly all sizes.
James Hess says
Yeah; I am not in agreement with the IDC definition either.
I would say “software-defined”; means that the storage is actually being _defined_ by software now, meaning a software-based layer of indirection or storage virtualization that presents the storage as something new: and frees the storage from traditional hardware structures such as “LUNs” or “Volumes”.
With software defined storage, no longer does an application environment care where or how a block of data is stored, nor is there such a concept as a “disk” or “volume”, as far as the consuming OS or application is concerned, just a bucket of data.
Amazon S3 seems like a good example of software-defined storage.
However, their implementation is limited in some respects.
Modern OSes such as Windows aren’t currently able to really use software defined storage for operating system storage and storage of binary applications; they are highly reliant on the old hardware-defined notion of a “Disk” or “Volume”.
Arun Raju says
The building blocks would still remain the same in my opinion irrespective of the vendor or the solution. We need more focus on converging the VM format and its associated files.
I believe that much time is lost on converting from one virtual disk format to another. Whether we converge on technology or not, at least in coming versions, we need one format and extension when it comes to virtual disk formats. The market is talking about Cloud (be it from OpenStack, VMware or Amazon). I opine that unless we have certain standards defined for VMs and vApps, this would make it cumbersome for IT Pros who waste a lot of time just for converting the virtual disks. Remember that the format ISO and its benefits we have in mounting and installing OS.
OVA and OVF are fine although XVA format is also trying to come in as well.
Using VSA as software defined storage is quite intuitive. That’s what virsto, tintri, datacore, etc have been doing. Basically, it is making storage hardware irrelevant by using off the shelf hardware with all the services, iops, etc.
Now, the idea behind software defined networking was to separate the control part which was fixed and make it dynamic by replacing it by software on a central controller. I just think it as a trade-off between performance and flexibility. Nothing new. The same old asic vs software argument holds here.
With SDN you could use off the shelf hardware as switches/routers with software in openVswitch, openVrouter similar to VSAs.
The current storage solutions NAS, SAN, object, etc do not have anything as asic or fixed part. Everything is already in software (pooling, auto tiering, caching, unified storage, deduplication, etc.) What you have shown in the diagram, i think more or less is already provided by all top end storage vendors. So where is the software part of software defined storage missing with current solutions ?
Its already there in software. Then why call it software defined storage ?
Then there is again another confusion with storage hypervisor/storage virtualization and software defined storage.
Duncan Epping says
Just slight correction: Tintri is not really a VSA right. It is still a box.
Hold on, you are focusing on the wrong part of Software Defined Storage –> Defined… Creating new workloads, storing data, objects etc… although the use of policies or profiles fully automated and integrated.
Using your logic, the same could be said about SDN, a switch already contains software right.
Jason Boche says
Many vendors have already jumped on the bandwagon stating they are “Software Defined Ready” because they fit a very loose interpretation of what exactly SD is (ie. their widget has software baked in like the switch you mention). IMO, VMware coined the moniker and is responsible for firming up what it is and what it isn’t. If the movement is truely more about policy and profile-driven abstraction, a more meaningful acronym could have been instantiated. I’m not jumping to any conclusions yet, there’s a lot that remains to be seen and proven.
Very interesting article!
Imho success of SDS will depend on the type of integration. “abstract, pool, automate” should be changed to “abstract, pool, automate, integrate”. We will see faster CPUs, more RAM, faster networks, but will we see 10x faster storage appliances? I don’t think so. Essential parts of SDS ($$, DS, OpenIO) should be part of the Hypervisor and/or part of the operating system (e.g. the big X versions). This would allow a slim architecture which does’t waste performance.
Maybe SDS also leads to an shift of intelligence from the firmware / chips of storage devices to the Hypervisor? But what does it mean? Different SDS plugins for different Hypervisors? What will happen if you connect an Hypervisor using an intelligent SDS system to an old school storage system also containing a lot of storage intellgence?
Abdullah Abdullah says
I totally agree with you, I wouldn’t think of just shoving out the legacy storage out of the picture and personally I do believe that it will always remain there in the back-end and it will be abstracted in the front-end.
The key lies within the API’s and how vendors tend to bend them and yield them to get the best out of the physical storage box, even with VSA’s we’d still need a controller and disks in the back-end so the Software Defined Data-center will always rely on certain aspects of hardware.
Good idea for a webinar for discussion I’d say, thank you =).
John Martin says
IIMHO VSA’s will be an important part of the software defined storage (SDS) landscape but by no means are they the complete story. What is lacking in SDS is the equivalent of flow-tables in switches. If you go with the whole “separate the control plane from the data plane” definition of software defined anything, then you could reasonably argue that this is exactly what things like the VERITAS volume manager and file system did way back in the 80’s. For a whole stack of good reasons people chose to bifurcate that responsibility of managing that functionality increasingly into the storage and application layers, leaving those product with increasingly niche roles. The advent of SDS might change swing that pendulum back towards 80’s style architecture for a while, but people tend towards vertically integrated solutions when the complexities of managing and integrating solutions themselves becomes economically unviable, and designing a reliable storage solution with high performance at large scale that caters for a large variety or workload types is very very hard to do well.
Going back to the lack of a storage equivalent of flow-tables, the trouble with SDS is that storage requirements are much less homogenous than switching requirements and much harder to bring down to a small number of discrete functions that can be acclerated in hardware. I think that over time these will become more obvious, the first and most obvious of which is copy offload/managment, but these requirements will probably evolve over time.
Rather than focus on building an industry/standards defined theoretical model, and trying to wedge/judge all the designs by that model, I think we’d be better served by loosening up the vertical integration of storage systems and then finding a variety of creative ways of leveraging large amounts of cheap CPU/Memory/Cache/Disk sitting in the virtualisation layer. VSA’s are a fairly coarse grained way of acheiving this, but many of them don’t elegantly leverage tightly/verticaly integrated infrastructure to accelerate or drive efficiencies where that is appropriate.
For example, there are ways of using the hypervisor resources as a “data plane” and leaving the control plane in the centralised array, such as NetApp FlashAccel which. This is kind of counter-intuitive to the existing “control-plane lives in the hypervisor” model as the cache is seen as an extension of the hardware array rather than the array being seen as an extension of the hypervisor. To be fair the model isn’t that pure, as control portions are distributed between the array and they hypervisor. My point is that the boundaries become a lot fuzzier, and will be functionality will be divided and combined in a variety of interesting ways, and so long as storage is asked to perform so many different tasks, I think that’s a good thing.
While I love VSAs as a conceptually neat little package of functionality with tightly defined boundaries, (The DataONTAP VSA’s in particular, especially if you’re aware of their roadmap) I think that data and storage management will for the foreseeable future be a shared responsibility between applications, hypervisors, operating systems and arrays. The biggest challenge we face is co-ordinating these responsibilities and choosing the most efficient and automatable ways of combining them to give customers what they need without needlessly locking them into inflexible architecture choices.
John Nicholson says
Looking at IDC’s definition most of the legacy vendors would qualify. EMC’s got only a handful of proprietary hardware in their VMAX’s (Think a single ASIC for encryption, and a somewhat custom crossbar switch?). HDS and 3PAR don’t qualify (use of ASIC’s and FPGA”s) but pretty much everyone else out there already runs on commodity hardware, the only thing stopping you is their go to market strategy, and desire to avoid supporting junk hardware and own physical support. Given that a lot of vendors have Demo VSA’s either floating around externally or internally I think this is a definition that pretty much defines all storage companies. I’m not sure if I want to define SDS as just moving blocks around in a really smart way, or if I prefer including of the file level abstraction and management that object stores and other new systems have, as this is the real value of software and data to business’s not just moving blocks faster.
John Martin says
I like your perspective, but I think there is a distinction between commodity hardware and hardware built from commodity components. That begs the question though, where exactly do you draw the line ? And THAT is the billion dollar Software Defined Question … where do you draw the line between the components/structural functions. I reckon that in the next few years there will be plenty of “religious” argumentsto be had around that very question.
In the mean time, I’m really enjoying the open debates. … interesting times indeed.
Michael Nauen says
The hypervisor is a virtual room. Most people did not understand that. In a virtual room you can create any object. VSAN is only the next step. Although a X86 is a virtual room. As the X86 server takes physical room today without hard disk and ssd this is wasted space. With a VSAN over several X86 boxes it eates completly the phsical space of todays san environement.
The next big thing in virtualization is aggreation of all X86 boxes.
Douglas O'Flaherty says
I’ve proposed that Software Defined Storage needs a maturity model, where software on commodity hardware is the ‘baby step’ while storage abstraction and policy based placement across heterogenous systems is the mature solution. … http://www.tonian.com/blog-post/software-defined-storage-finally/