software defined storage

Tintri releases version 2.0 – Replication added!

Duncan Epping · Apr 8, 2013 ·

I have never made it a secret that I am a fan of Tintri. I just love their view on storage systems and the way they decided to solve specific problems. When I was in Palo Alto last month I had the opportunity to talk to the folks of Tintri again and what they were working on. Of course we had a discussion about the Software Defined Datacenter and more specifically Software Defined Storage and what Tintri would bring to the SDS era. As all of that was under strict NDA I can’t share it, but what I can share are some cool details of what Tintri has just announced, version 2.0 of their storage system.

For those who have never even looked in to Tintri I suggest you catch-up by reading the following two articles:

When I was briefed initially about Tintri back in 2011 one of the biggest areas of improvement I saw were around availability. Two things were on my list to be solved, first one was the “single controller” approach they took. This was solved back in 2011. Another feature I missed was replication. Replication is the main feature that is announced today and it will be part of the 2.0 release of their software. What I loved about Tintri is that all data services they offered were on a virtual machine level. Of course the same applies to replication, announced today.

Tintri offers a-synchronous replication which can go down to a recovery point objective (RPO) of 15 minutes. Of course I asked if there were plans on bringing this down, and indeed this is planned but I can’t say when. What I liked about this replication solution is that as data is deduplicated and compressed the amount of replication traffic is kept to a limit. Let me rephrase that, globally deduplicated… meaning that if a block already exists in the DR site then it will not be replicated to that site. This will definitely have a positive impact on your bandwidth consumption, and Tintri has seen up to 95% reduction in WAN bandwidth consumption. The diagram below shows how this works.

The nice thing about the replication technique Tintri offers is that it is well integrated with VMware vSphere and thus it offers “VM consistent” snapshots by leveraging VMware’s quiescing technology. My next obvious question was what about Site Recovery Manager? As failover is on a per VM basis, orchestrating / automating this would be a welcome option. Tintri is still working on this and hopes to add support for Site Recovery Manager soon. Another I would like to see added was grouping of virtual machines for replication consistency; again this is something which is on the road map and hopefully will be added soon.

One of the other cool features which is added with this release is remote cloning. Remote cloning basically allows you to clone a virtual machine / template to a different array. Those who have multiple vCenter Server instances in their environment know what a pain this can be, hence the reason I feel this is one of those neat little features which you will appreciate once you have used it. Would be great if this functionality could be integrated within the vSphere Web Client as a “right click”, judging by the comments made by the Tintri team I would expect that they are already working on deeper / tighter integration with the Web Client, and I can only hope a vSphere Web Client plugin will be released soon so that all granular VM level data services can be managed from a single console.

All-in-all a great new release by Tintri, if you ask me this release is 3 huge steps forward!

Software Defined Storage; just some random thoughts

Duncan Epping · Apr 5, 2013 ·

I have been reading many articles over the last weeks on Software Defined Storage and wrote an article on this topic a couple of weeks ago. While reading up one thing that stood out was that every single storage/flash vendor out there has jumped on the bandwagon and (ab)uses this term where ever and when ever possible. In most of those cases however the term isn’t backed by SDS enabling technology or even a strategy, but lets not get in to the finger pointing contest as I think my friends who work for storage vendors are more effective at that.

The article which triggered me to write this article was released a week and a half a go by CRN. The article was a good read, so don’t expect me to tear it down. The article just had me thinking about various things, and what better way to clear your head then to write an article about it. Lets start with the following quote:

While startups and smaller software-focused vendors are quick to define software-defined storage as a way to replace legacy storage hardware with commodity servers, disk drives and flash storage, large storage vendors are not giving ground in terms of the value their hardware offers as storage functionality moves toward the software layer.

Let me also pull out this comment by Keith Norbie in the same article, as I think Keith hit the nail on the head:

Norbie said to think of the software-defined data center as a Logitech Harmony remote which, when used with a home theater system, controls everything with the press of a button.

If you take a look at how Keith’s quote relates to Software Defined Storage it would mean that you should be able to define EVERYTHING via software. Just like you can simply program the Logitech Harmony remote to work with all your devices; you should be able to configure your platform in such a way that spinning up new storage objects can be done by the touch of one button! Now getting back to the first quote, whether functionality moves out of a storage system to a management tool or even to the platform is irrelevant if you ask me. If your storage system has an API and it is allows you to do everything programmatically you are half way there.

I understand that many of the startups like to make potential customers believe different, but the opportunity is there for everyone if you ask me. Yes that includes old-timers like EMC / NetApp / IBM (etc) and their “legacy” arrays. (As some of the startups like to label them.) Again, don’t get me wrong… playing in the SDS space will require significant changes to most storage platforms as most were never architected for this usecase. Most are currently not capable of creating thousands of new objects programmatically. Many don’t even have a public API.

However, what is missing today is not just a public API on most storage systems, it is also the platform which doesn’t allow you to efficiently manage these storage systems through those APIs. When I say platform I refer to vSphere, but I guess the same applies to Hyper-V, KVM, Xen etc. Although various sub-components are already there like the vSphere APIs for Array Integration (VAAI) and the vSphere APIs for Storage Awareness (VASA), there are also still a lot capabilities missing. A good example would be defining and setting specific data-services on a virtual disk level granularity, or end to end Quality of Service for virtual disks or virtual machines, or automatic instantiation of storage objects during virtual machine provisioning without manual action required from your storage admin. Of course, all of this from a single management console…

If you look at VMware vSphere and what is being worked on in the future you know those capabilities will come at some point, in this case I am referring to what was previewed at VMworld as “virtual volumes” (sometime also referred to as VVOLs), but this will take time… Yes I know some storage vendors already offer some of this granularity (primarily the startups out there), but can you define/set this from your favorite virtual infrastructure management solution during the provisioning of a new workload? Or do you need to use various tools to get the job done? If you can define QoS on a per VM basis, is this end-to-end? What about availability / disaster recovery, do they offer a full solution for that? If so, is it possible to simply integrate this with other solutions like for instance Site Recovery Manager?

I think exciting times are ahead of us; but lets all be realistic… they are ahead of us. There is no “Logitech Harmony” experience yet, but I am sure we will get there in the (near) future.

Startup Intro: AetherStore

Duncan Epping · Mar 22, 2013 ·

Every once in a while you see a solution by a startup and you get all excited. AetherStore is one of those type of solutions. The funny thing is that AetherStore is not directly related to my day-to-day job, but I can fully relate to their pitch. So what is AetherStore and who are the folks behind it?

AetherStore was founded by three graduates from Scotland’s University of St Andrews. This by itself is worth mentioning in my opinion as especially in this space I don’t typically see an enormous amount of innovation coming out of Europe. (Although since then they moved to the US.) With experience in distributed systems, fault tolerance, databases and storage it is not surprising to see what problems they are trying to solve and how they are intending to solve it.

AetherStore is indeed a storage solution as you probably had already guessed. AetherStore is all about using spare resources, and in this case storage resources. Essentially what AetherStore is aiming to do for your company is leveraging the available local disk space of your desktops (and servers for that matter) and offer that up as a “data store”. In other words; if you have 20 desktops with a 1 TB disk but only 100GB is used then 900GB of that disk can be used for other purposes. Now reality of course is that it isn’t possible to use the full 900GB for other purposes but you get my point.

AetherStore essentially is a distributed data store solution. This distributed data store is served up to users as a regular network file share and all the magic AetherStore does is hidden from the user. I guess the big question that pops-up immediately is what about availability, security and performance? All three of those are typically what either keeps the user, or the administrator busy. AetherStore solves those problems in various ways:

Performance: a local cache is used to optimize the end-user experience
Availability: Data is replicated to multiple “nodes” meaning that if a “node” fails than data can be reconstructred. On top of that AetherStore offers the ability to backup (and restore) data to the “cloud” (Mozy, Amazon etc)
Security: Data is encrypted

That is not all, on top of that AetherStore offers versioning of files and ensure efficiency by offering deduplication. I guess it all sounds very promising right? In my opinion it does, and it is one of those solutions that I have on my “watch lists”.

I do wonder what the requirements are when it comes down to availability of data when people move around different desktops; and desktops are also powered-off or restarted by users at random. I also would like to point out here that I have not played with AetherStore, neither is this article sponsored or am I affiliated with AetherStore in any way. This is simply and introduction to a cool startup which managed to intrigue / interest me with their technology.

If you want to find out more about AetherStore, make sure to sign up on http://www.aetherstore.com/ for early access if you are interested, and/or follow them on twitter. If you want to know more, I can recommend this white paper about AetherStore as it reveals some more of details of the implementation.

Why the world needs Software Defined Storage

Duncan Epping · Mar 6, 2013 ·

Yesterday I was at a Software Defined Datacenter event organized by IBM and VMware. The famous Cormac Hogan presented on Software Defined Storage and I very much enjoyed hearing about the VMware vision and of course Cormac’s take on this. Coincidentally, last week I read this article by long-time community guru Jason Boche on VAAI and number of VMs, and after a discussion with a customer yesterday (at the event) about their operational procedures for provisioning new workloads I figured it was time to write down my thoughts.

I have seen many different definitions so far for Software Defined Storage and I guess there is a source of truth in all of them. Before I explain what it means to me, let me describe commonly faced challenges people have today.

In a lot of environments managing storage and associated workloads is a tedious task. It is not uncommon to see large spreadsheets with a long list of LUNs, IDs, Capabilities, Groupings and whatever more is relevant to them and their workloads. These spreadsheets are typically used to decide where to place a virtual machine or virtual disk. Based on the requirements of the application a specific destination will be selected. On top of that, a selection will need to be made based on currently available disk space of a datastore and of course the current IO load. You do not want to randomly place your virtual machine and find out two days later that you are running out of disk space… Well, that is if you have a relatively mature provisioning process. Of course it is also not uncommon to just pick a random datastore and hope for the best.

To be honest, I can understand many people randomly provision virtual machines. Keeping track of virtual disks, datastores, performance, disk space and other characteristics… it is simply too much and boring. Didn’t we invent computer systems to do these repeatable boring tasks for us? That leads us to the question where and how Software Defined Storage should help you?

A common theme recurring in many “Software Defined” solutions presented by VMware is:

Abstract, Pool, Automate.

This also applies to Software Defined Storage in my opinion. These are three basic requirements that a Software Defined Storage solution should meet. But what does this mean and how does it help you? Let me try to make some sense out of that nice three word marketing slogan:

Software Defined Storage should enable you to provision workloads to a pool of virtualized physical resources based on service level agreements (defined in a policy) in an automated fashion.

I understand that is a mouth full, so lets elaborate a bit more. Think about the challenges I described above… or what Jason described with regards to “VMs per Volume” and how there are various different components that can impact your service level. A Software Defined Storage (SDS) solution should be able to intelligently place virtual disks (virtual machines / vApps) based on selected policy for the object (virtual disk / machine / appliance). These policies typically contain characteristics of the provided service level. On top of that a Software Defined Storage solution should take risks / constraints in to account. Meaning that you don’t want your workload to be deployed to a volume which is running out of disk space for instance.

What about those characteristics, what are those? Characteristics could be anything, just two simple examples to make it a bit more obvious:

Does your application require recover-ability after a disaster? –> SDS selects destination which is replicated, or instructs storage system to create replicated object for the VM
Does your application require a certain level of performance? –> SDS selects destination that can provide this performance, or instructs storage system to reserve storage resources for the VM

Now this all sounds a bit vague, but I am purposely trying to avoid using product or feature names. Software Defined Storage is not about a particular feature, product or storage system. Although I dropped the word policy, note that enabling Profile Driven Storage within vCenter Server does not provide you a Software Defined Storage solution. It shouldn’t matter either (to a certain extent) if you are using EMC, NetApp, Nimbus, a VMware software solution or any of the other thousands of different storage systems out there. Any of those systems, or even a combination of them, should work in the software defined world. To be clear, in my opinion (today) there isn’t such a thing as a Software Defined Storage product, it is a strategy. It is a way of operating that particular part of your datacenter.

To be fair, there is a huge difference between various solutions. There are products and features out there that will enable you to build a solution like this and transform the way you manage your storage and provision new workloads. Products and features that will allow you to create a flexible offering. VMware has been and is working hard to be a part of this space, vSphere Replication / Storage DRS / Storage IO Control / Virsto / Profile Driven Storage are part of the “now”, but just the beginning… Virtual Volumes, Virtual Flash and Distributed Storage have all been previewed at VMworld and are potentially what is next. Who knows what else is in the pipeline or what other vendors are working on.

If you ask me, there are exciting times ahead. Software Defined Storage is a big part of the Software Defined Data Center story and you can bet this will change datacenter architecture and operations.

** There are two excellent articles on this topic the first by Bill Earl, and the second by Christos Karamanolis, make sure to read their perspective. **