Software Defined

RE: Is VSA the future of Software Defined Storage? (OpenIO)

Duncan Epping · Apr 24, 2013 ·

I was reading this article on the HP blog about the future of Software Defined Storage and how the VSA fits perfectly. Although I agree that a VSA (virtual storage appliance) could potentially be a Software Defined Storage solution I do not really agree with IDC quote used for the basis of this article and on top of that I think some crucial factors are left out. Lets start with the IDC quote:

IDC defines software-based storage as any storage software stack that can be installed on any commodity resources (x86 hardware, hypervisors, or cloud) and/or off-the-shelf computing hardware and used to offer a full suite of storage services and federation between the underlying persistent data placement resources to enable data mobility of its tenants between these resources.

Software Defined Storage solutions to me are not necessarily just a software-based storage product. Yes as said a VSA, or something like Virtual SAN (VSAN), could be part of your strategy but how about the storage you have today? Do we really expect customers to forget about their “legacy” storage and just write it off? Surely that won’t happen, especially not in this economical climate and considering many companies invested heavily in storage when they started virtualizing production workloads. What is missing in this quote, or in that article (although briefly mentioned in linked article), is the whole concept of “abstract, pool, automate”. I guess some of you will say, well that is VMware’s motto right? Well yes and no. Yes, “abstract, pool, automate” is the way of the future if you ask VMware. However this is not something new. Think about Software Defined Networking for instance, this is fully based on the “abstract, pool, automate” concept.

This had me thinking, what is missing today? There are various different initiatives around networking (openflow etc), but what about storage? I created this diagram that from a logical perspective explains what I think we need when it comes to Software Defined Storage. I guess this is what Ray Lucchesi is referring to in his article on Openflow and the storage world. Brent Compton from FusionIO also had an insightful article on this topic, worth reading.

If you look at my diagram… (yes FCoE/Infiniband/DAS etc is missing, not because it shouldn’t be supported but just to simplify the diagram) I drew a hypervisor at the top, reason for it being is that I have been in the hypervisor business for years but reality is this could be anything right. From a hypervisor perspective all you should see is a pool. A pool for your IO, a pool where you can store your data. Now this layer should provide you various things. Lets start at the bottom and work our way up.

IO Connect = Abstract / Pool logic. This should allow you to connect to various types of storage abstract it for your consumers and pool it of course
API = Do I need to explain this? It addresses the “automate” part but also probably even more importantly the integration aspect. Integration is key for a Software Defined Datacenter. The API(s) should be able to provide both north-, south-, east- and west-bound capabilities (for explanation around this read this article, although it is about Software Defined Networking it should get the point across)
PE = Policy Engine, takes care of making sure your data ends up on the right tier with the right set of services
DS = Data Services. Although the storage can provide specific data services this is also an opportunity for the layer in between the hypervisor and the storage. Matter a fact, data services should be possible on all layers: physical storage, appliance, host based etc
$$ = Caching. I drew this out separately for a reason, yes it could be seen as a data service but I wanted it separately as for any layer inserted there is an impact on performance. An elegant and efficient caching solution at the right level could mitigate the impact. Again, caching could be part of this framework but could very well sit outside of it on the host-side or at the storage layer, or appliance based

One thing I want to emphasize here is the importance of the API. I briefly mentioned enabling north-, south-, east- and west-bound capabilities but in order for a solution like this to be successful this is a must. Although with automation you can go a long way, integration is key here! Whether it is seamless integration with your physical systems, integration with your virtual management solution or with an external cloud storage solution… These APIs should be able to provide that kind of functionality and be enable a true pluggable framework experience.

If you look at this approach, and I drew this out before I even looked at Virsto, it kind of resembles what Virsto offers today. Although there are components missing the concept is similar. It also resembles VVOLs in a way, which was discussed at VMworld in 2011 and 2012. I would say that what I described is a combination of both combined with what Software Defined Networking promises.

So where am I going with this? Good question, honestly I don’t know… For me articles like these are a nice way of blowing steam, get the creative juices going and open up the conversation. I do feel the world is ready for the next step from a Software Defined Storage perspective, I guess the really question is who is going to take this next step and when? I would love to hear your feedback.

Install and configure Virsto

Duncan Epping · Apr 23, 2013 ·

Last week I was in Palo Alto and I had a discussion about Virsto. Based on that discussion I figured it was time to setup Virsto in my Lab. I am not going to go in to much details around what Virsto is or does as I already did that in this article around the time VMware acquired Virsto. On top of that there is an excellent article by Cormac which provides an awesome primer. But lets use two quotes from both of the articles to give an idea of what to expect:

source: yellow-bricks.com
Virsto has developed an appliance and a host level service which together forms an abstraction layer for existing storage devices. In other words, storage devices are connected directly to the Virsto appliance and Virsto aggregates these devices in to a large storage pool. This pool is in its turn served up to your environment as an NFS datastore.

source: cormachogan.com
Virsto Software aims to provide the advantages of VMware’s linked clones (single image management, thin provisioning, rapid creation) but deliver better performance than EZT VMDKs.

Just to give an idea, what do I have running in my lab?

3 x ESX 5.1 host
1 x vCenter Server (Windows install as VCVA isn’t supported_
VNX 5500

After quickly glancing the quickstart guide I noticed I needed a Windows VM to install some of Virsto’s components, that VM is what Virsto refers to as the “vMaster”. I also need a bunch of empty LUNs which will be used for storage. I also noticed reference of Namespace VMs and vIOService VMs. Hmmm, it sounds complicated, but is it? I am guessing these components will need to be connected. This is kind of the idea, note that the empty LUNs will be automatically connected by to the IOService VMs. I did not add those to the diagram as that would make it more complex than needed.

Tintri releases version 2.0 – Replication added!

Duncan Epping · Apr 8, 2013 ·

I have never made it a secret that I am a fan of Tintri. I just love their view on storage systems and the way they decided to solve specific problems. When I was in Palo Alto last month I had the opportunity to talk to the folks of Tintri again and what they were working on. Of course we had a discussion about the Software Defined Datacenter and more specifically Software Defined Storage and what Tintri would bring to the SDS era. As all of that was under strict NDA I can’t share it, but what I can share are some cool details of what Tintri has just announced, version 2.0 of their storage system.

For those who have never even looked in to Tintri I suggest you catch-up by reading the following two articles:

When I was briefed initially about Tintri back in 2011 one of the biggest areas of improvement I saw were around availability. Two things were on my list to be solved, first one was the “single controller” approach they took. This was solved back in 2011. Another feature I missed was replication. Replication is the main feature that is announced today and it will be part of the 2.0 release of their software. What I loved about Tintri is that all data services they offered were on a virtual machine level. Of course the same applies to replication, announced today.

Tintri offers a-synchronous replication which can go down to a recovery point objective (RPO) of 15 minutes. Of course I asked if there were plans on bringing this down, and indeed this is planned but I can’t say when. What I liked about this replication solution is that as data is deduplicated and compressed the amount of replication traffic is kept to a limit. Let me rephrase that, globally deduplicated… meaning that if a block already exists in the DR site then it will not be replicated to that site. This will definitely have a positive impact on your bandwidth consumption, and Tintri has seen up to 95% reduction in WAN bandwidth consumption. The diagram below shows how this works.

The nice thing about the replication technique Tintri offers is that it is well integrated with VMware vSphere and thus it offers “VM consistent” snapshots by leveraging VMware’s quiescing technology. My next obvious question was what about Site Recovery Manager? As failover is on a per VM basis, orchestrating / automating this would be a welcome option. Tintri is still working on this and hopes to add support for Site Recovery Manager soon. Another I would like to see added was grouping of virtual machines for replication consistency; again this is something which is on the road map and hopefully will be added soon.

One of the other cool features which is added with this release is remote cloning. Remote cloning basically allows you to clone a virtual machine / template to a different array. Those who have multiple vCenter Server instances in their environment know what a pain this can be, hence the reason I feel this is one of those neat little features which you will appreciate once you have used it. Would be great if this functionality could be integrated within the vSphere Web Client as a “right click”, judging by the comments made by the Tintri team I would expect that they are already working on deeper / tighter integration with the Web Client, and I can only hope a vSphere Web Client plugin will be released soon so that all granular VM level data services can be managed from a single console.

All-in-all a great new release by Tintri, if you ask me this release is 3 huge steps forward!

Software Defined Storage; just some random thoughts

Duncan Epping · Apr 5, 2013 ·

I have been reading many articles over the last weeks on Software Defined Storage and wrote an article on this topic a couple of weeks ago. While reading up one thing that stood out was that every single storage/flash vendor out there has jumped on the bandwagon and (ab)uses this term where ever and when ever possible. In most of those cases however the term isn’t backed by SDS enabling technology or even a strategy, but lets not get in to the finger pointing contest as I think my friends who work for storage vendors are more effective at that.

The article which triggered me to write this article was released a week and a half a go by CRN. The article was a good read, so don’t expect me to tear it down. The article just had me thinking about various things, and what better way to clear your head then to write an article about it. Lets start with the following quote:

While startups and smaller software-focused vendors are quick to define software-defined storage as a way to replace legacy storage hardware with commodity servers, disk drives and flash storage, large storage vendors are not giving ground in terms of the value their hardware offers as storage functionality moves toward the software layer.

Let me also pull out this comment by Keith Norbie in the same article, as I think Keith hit the nail on the head:

Norbie said to think of the software-defined data center as a Logitech Harmony remote which, when used with a home theater system, controls everything with the press of a button.

If you take a look at how Keith’s quote relates to Software Defined Storage it would mean that you should be able to define EVERYTHING via software. Just like you can simply program the Logitech Harmony remote to work with all your devices; you should be able to configure your platform in such a way that spinning up new storage objects can be done by the touch of one button! Now getting back to the first quote, whether functionality moves out of a storage system to a management tool or even to the platform is irrelevant if you ask me. If your storage system has an API and it is allows you to do everything programmatically you are half way there.

I understand that many of the startups like to make potential customers believe different, but the opportunity is there for everyone if you ask me. Yes that includes old-timers like EMC / NetApp / IBM (etc) and their “legacy” arrays. (As some of the startups like to label them.) Again, don’t get me wrong… playing in the SDS space will require significant changes to most storage platforms as most were never architected for this usecase. Most are currently not capable of creating thousands of new objects programmatically. Many don’t even have a public API.

However, what is missing today is not just a public API on most storage systems, it is also the platform which doesn’t allow you to efficiently manage these storage systems through those APIs. When I say platform I refer to vSphere, but I guess the same applies to Hyper-V, KVM, Xen etc. Although various sub-components are already there like the vSphere APIs for Array Integration (VAAI) and the vSphere APIs for Storage Awareness (VASA), there are also still a lot capabilities missing. A good example would be defining and setting specific data-services on a virtual disk level granularity, or end to end Quality of Service for virtual disks or virtual machines, or automatic instantiation of storage objects during virtual machine provisioning without manual action required from your storage admin. Of course, all of this from a single management console…

If you look at VMware vSphere and what is being worked on in the future you know those capabilities will come at some point, in this case I am referring to what was previewed at VMworld as “virtual volumes” (sometime also referred to as VVOLs), but this will take time… Yes I know some storage vendors already offer some of this granularity (primarily the startups out there), but can you define/set this from your favorite virtual infrastructure management solution during the provisioning of a new workload? Or do you need to use various tools to get the job done? If you can define QoS on a per VM basis, is this end-to-end? What about availability / disaster recovery, do they offer a full solution for that? If so, is it possible to simply integrate this with other solutions like for instance Site Recovery Manager?

I think exciting times are ahead of us; but lets all be realistic… they are ahead of us. There is no “Logitech Harmony” experience yet, but I am sure we will get there in the (near) future.

Why the world needs Software Defined Storage

Duncan Epping · Mar 6, 2013 ·

Yesterday I was at a Software Defined Datacenter event organized by IBM and VMware. The famous Cormac Hogan presented on Software Defined Storage and I very much enjoyed hearing about the VMware vision and of course Cormac’s take on this. Coincidentally, last week I read this article by long-time community guru Jason Boche on VAAI and number of VMs, and after a discussion with a customer yesterday (at the event) about their operational procedures for provisioning new workloads I figured it was time to write down my thoughts.

I have seen many different definitions so far for Software Defined Storage and I guess there is a source of truth in all of them. Before I explain what it means to me, let me describe commonly faced challenges people have today.

In a lot of environments managing storage and associated workloads is a tedious task. It is not uncommon to see large spreadsheets with a long list of LUNs, IDs, Capabilities, Groupings and whatever more is relevant to them and their workloads. These spreadsheets are typically used to decide where to place a virtual machine or virtual disk. Based on the requirements of the application a specific destination will be selected. On top of that, a selection will need to be made based on currently available disk space of a datastore and of course the current IO load. You do not want to randomly place your virtual machine and find out two days later that you are running out of disk space… Well, that is if you have a relatively mature provisioning process. Of course it is also not uncommon to just pick a random datastore and hope for the best.

To be honest, I can understand many people randomly provision virtual machines. Keeping track of virtual disks, datastores, performance, disk space and other characteristics… it is simply too much and boring. Didn’t we invent computer systems to do these repeatable boring tasks for us? That leads us to the question where and how Software Defined Storage should help you?

A common theme recurring in many “Software Defined” solutions presented by VMware is:

Abstract, Pool, Automate.

This also applies to Software Defined Storage in my opinion. These are three basic requirements that a Software Defined Storage solution should meet. But what does this mean and how does it help you? Let me try to make some sense out of that nice three word marketing slogan:

Software Defined Storage should enable you to provision workloads to a pool of virtualized physical resources based on service level agreements (defined in a policy) in an automated fashion.

I understand that is a mouth full, so lets elaborate a bit more. Think about the challenges I described above… or what Jason described with regards to “VMs per Volume” and how there are various different components that can impact your service level. A Software Defined Storage (SDS) solution should be able to intelligently place virtual disks (virtual machines / vApps) based on selected policy for the object (virtual disk / machine / appliance). These policies typically contain characteristics of the provided service level. On top of that a Software Defined Storage solution should take risks / constraints in to account. Meaning that you don’t want your workload to be deployed to a volume which is running out of disk space for instance.

What about those characteristics, what are those? Characteristics could be anything, just two simple examples to make it a bit more obvious:

Does your application require recover-ability after a disaster? –> SDS selects destination which is replicated, or instructs storage system to create replicated object for the VM
Does your application require a certain level of performance? –> SDS selects destination that can provide this performance, or instructs storage system to reserve storage resources for the VM

Now this all sounds a bit vague, but I am purposely trying to avoid using product or feature names. Software Defined Storage is not about a particular feature, product or storage system. Although I dropped the word policy, note that enabling Profile Driven Storage within vCenter Server does not provide you a Software Defined Storage solution. It shouldn’t matter either (to a certain extent) if you are using EMC, NetApp, Nimbus, a VMware software solution or any of the other thousands of different storage systems out there. Any of those systems, or even a combination of them, should work in the software defined world. To be clear, in my opinion (today) there isn’t such a thing as a Software Defined Storage product, it is a strategy. It is a way of operating that particular part of your datacenter.

To be fair, there is a huge difference between various solutions. There are products and features out there that will enable you to build a solution like this and transform the way you manage your storage and provision new workloads. Products and features that will allow you to create a flexible offering. VMware has been and is working hard to be a part of this space, vSphere Replication / Storage DRS / Storage IO Control / Virsto / Profile Driven Storage are part of the “now”, but just the beginning… Virtual Volumes, Virtual Flash and Distributed Storage have all been previewed at VMworld and are potentially what is next. Who knows what else is in the pipeline or what other vendors are working on.

If you ask me, there are exciting times ahead. Software Defined Storage is a big part of the Software Defined Data Center story and you can bet this will change datacenter architecture and operations.

** There are two excellent articles on this topic the first by Bill Earl, and the second by Christos Karamanolis, make sure to read their perspective. **