I was reading this article on the HP blog about the future of Software Defined Storage and how the VSA fits perfectly. Although I agree that a VSA (virtual storage appliance) could potentially be a Software Defined Storage solution I do not really agree with IDC quote used for the basis of this article and on top of that I think some crucial factors are left out. Lets start with the IDC quote:
IDC defines software-based storage as any storage software stack that can be installed on any commodity resources (x86 hardware, hypervisors, or cloud) and/or off-the-shelf computing hardware and used to offer a full suite of storage services and federation between the underlying persistent data placement resources to enable data mobility of its tenants between these resources.
Software Defined Storage solutions to me are not necessarily just a software-based storage product. Yes as said a VSA, or something like Virtual SAN (VSAN), could be part of your strategy but how about the storage you have today? Do we really expect customers to forget about their “legacy” storage and just write it off? Surely that won’t happen, especially not in this economical climate and considering many companies invested heavily in storage when they started virtualizing production workloads. What is missing in this quote, or in that article (although briefly mentioned in linked article), is the whole concept of “abstract, pool, automate”. I guess some of you will say, well that is VMware’s motto right? Well yes and no. Yes, “abstract, pool, automate” is the way of the future if you ask VMware. However this is not something new. Think about Software Defined Networking for instance, this is fully based on the “abstract, pool, automate” concept.
This had me thinking, what is missing today? There are various different initiatives around networking (openflow etc), but what about storage? I created this diagram that from a logical perspective explains what I think we need when it comes to Software Defined Storage. I guess this is what Ray Lucchesi is referring to in his article on Openflow and the storage world. Brent Compton from FusionIO also had an insightful article on this topic, worth reading.
If you look at my diagram… (yes FCoE/Infiniband/DAS etc is missing, not because it shouldn’t be supported but just to simplify the diagram) I drew a hypervisor at the top, reason for it being is that I have been in the hypervisor business for years but reality is this could be anything right. From a hypervisor perspective all you should see is a pool. A pool for your IO, a pool where you can store your data. Now this layer should provide you various things. Lets start at the bottom and work our way up.
- IO Connect = Abstract / Pool logic. This should allow you to connect to various types of storage abstract it for your consumers and pool it of course
- API = Do I need to explain this? It addresses the “automate” part but also probably even more importantly the integration aspect. Integration is key for a Software Defined Datacenter. The API(s) should be able to provide both north-, south-, east- and west-bound capabilities (for explanation around this read this article, although it is about Software Defined Networking it should get the point across)
- PE = Policy Engine, takes care of making sure your data ends up on the right tier with the right set of services
- DS = Data Services. Although the storage can provide specific data services this is also an opportunity for the layer in between the hypervisor and the storage. Matter a fact, data services should be possible on all layers: physical storage, appliance, host based etc
- $$ = Caching. I drew this out separately for a reason, yes it could be seen as a data service but I wanted it separately as for any layer inserted there is an impact on performance. An elegant and efficient caching solution at the right level could mitigate the impact. Again, caching could be part of this framework but could very well sit outside of it on the host-side or at the storage layer, or appliance based
One thing I want to emphasize here is the importance of the API. I briefly mentioned enabling north-, south-, east- and west-bound capabilities but in order for a solution like this to be successful this is a must. Although with automation you can go a long way, integration is key here! Whether it is seamless integration with your physical systems, integration with your virtual management solution or with an external cloud storage solution… These APIs should be able to provide that kind of functionality and be enable a true pluggable framework experience.
If you look at this approach, and I drew this out before I even looked at Virsto, it kind of resembles what Virsto offers today. Although there are components missing the concept is similar. It also resembles VVOLs in a way, which was discussed at VMworld in 2011 and 2012. I would say that what I described is a combination of both combined with what Software Defined Networking promises.
So where am I going with this? Good question, honestly I don’t know… For me articles like these are a nice way of blowing steam, get the creative juices going and open up the conversation. I do feel the world is ready for the next step from a Software Defined Storage perspective, I guess the really question is who is going to take this next step and when? I would love to hear your feedback.