Software Defined Storage, which phase are you in?!

Working within R&D at VMware means you typically work with technology which is 1 – 2 years out, and discuss futures of products which are 2-3 years. Especially in the storage  space a lot has changed. Not just innovations within the hypervisor by VMware like Storage DRS, Storage IO Control, VMFS-5, VM Storage Policies (SPBM), vSphere Flash Read Cache, Virtual SAN etc. But also by partners who do software based solutions like PernixData (FVP), Atlantis (ILIO) and SANDisk FlashSoft. Of course there is the whole Server SAN / Hyper-converged movement with Nutanix, Scale-IO, Pivot3, SimpliVity and others. Then there is the whole slew of new storage systems some which are scale out and all-flash, others which focus more on simplicity, here we are talking about Nimble, Tintri, Pure Storage, Xtreme-IO, Coho Data, Solid Fire and many many more.

Looking at it from my perspective, I would say there are multiple phases when it comes to the SDS journey:

  • Phase 0 – Legacy storage with NFS / VMFS
  • Phase 1 – Legacy storage with NFS / VMFS + Storage IO Control and Storage DRS
  • Phase 2 – Hybrid solutions (Legacy storage + acceleration solutions or hybrid storage)
  • Phase 3 – Object granular policy driven (scale out) storage

I have written about Software Defined Storage multiple times in the last couple of years, have worked with various solutions which are considered to be “Software Defined Storage”. I have a certain view of what the world looks like. However, when I talk to some of our customers reality is different, some seem very happy with what they have in Phase 0. Although all of the above is the way of the future, and for some may be reality today, I do realise that Phase 1, 2 and 3 may be far away for many. I would like to invite all of you to share:

  1. Which phase you are in, and where you would like to go to?
  2. What you are struggling with most today that is driving you to look at new solutions?

Quality of components in Hybrid / All flash storage

Today I was answering some questions on the VMTN forums and one of the questions was around the quality of components in some of the all flash / hybrid arrays. This person kept coming back to the type of flash used (eMLC vs MLC, SATA vs NL-SAS vs SAS). One of the comments he made was the following:

I talked to Pure Storage but they want $$$ for 11TB of consumer grade MLC.

I am guessing he did a quick search on the internet, found a price for some SSDs and multiplied it and figured that Pure Storage was asking way too much… And even compared to some more traditional arrays filled with SSD they could sound more expensive. I guess this also applies to other solutions, so I am not calling out Pure Storage here.One thing some people seem to forget is that when it comes to these new storage architectures is that they are build with flash in mind.

What does that mean? Well everyone has heard all of the horror stories around consumer grade flash wearing out extremely fast and blowing up in your face. Well fortunately that is only true to a certain extent as some consumer grade SSDs easily reach 1PB of writes these days. On top of that there are a couple of things I think you should know and consider before making statements like these or be influenced by a sales team who says “well we offer SLC versus MLC so we are better than them”.

For instance (As Pure Storage lists on their website), there are many more MLC drives shipped than any other type at this point. Which means that it has been tested inside out by consumers, who can break devices in many more ways than you or your QA team can? Right, the consumer! More importantly if you ask me, ALL of these new storage architectures have in-depth knowledge of the type of flash they are using. That is how their system was architected! They know how to leverage flash, they know how to write to flash, they know how to avoid fast wear out. They developed an architecture which was not only designed but also highly optimized for flash… This is what you pay for. You pay for the “total package” which means the whole solution, not just those flash devices that are leveraged. The flash devices are a part of the solution, and just a relatively small part if you ask me. You pay for total capacity with low latency and functionality like deduplication, compression and replication (in some cases). You pay for the ease of deployment and management (operational efficiency), meaning you get to spent your time on stuff that matters to your customer… their applications.

You can summarize all of it in a single sentence: the physical components used in all of these solutions are just a small part of the solution, whenever someone tries to sell you the “hardware” that is when you need to be worried!

Looking back: Software Defined Storage…

Over a year ago I wrote an article (multiple actually) about Software Defined Storage, VSAs and different types of solutions and how flash impacts the world. One of the articles contained a diagram and I would like to pull that up for this article. The diagram below is what I used to explain how I see a potential software defined storage solution. Of course I am severely biased as a VMware employee, and I fully understand there are various scenarios here.

As I explained the type of storage connected to this layer could be anything DAS/NFS/iSCSI/Block who cares… The key thing here is that there is a platform sitting in between your storage devices and your workloads. All your storage resources would be aggregated in to a large pool and the layer should sort things out for you based on the policies defined for the workloads running there. Now I drew this layer coupled with the “hypervisor”, but thats just because that is the world I live in.

Looking back at this article and looking at the state of the industry today, a couple of things stood out. First and foremost, the term “Software Defined Storage” has been abused by everyone and doesn’t mean much to me personally anymore. If someone says during a bloggers briefing “we have a software defined storage solution” I typically will ask them to define it, or explain what it means to them. Anyway, why did I show that diagram, well mainly because I realised over the last couple of weeks that a couple of companies/products are heading down this path.

If you look at the diagram and for instance think about VMware’s own Virtual SAN product than you can see what would be possible. I would even argue that technically a lot of it would be possible today, however the product is also lacking in some of these spaces (data services) but I expect this to be a matter of time. Virtual SAN sits right in the middle of the hypervisor, the API and Policy Engine is provided by the vSphere layer, it has its own caching service… For now it isn’t supported to connect SAN storage, but if I want to I could even today simply by tagging “LUNs” as local disks.

Another product which comes to mind when looking at the diagram is Pernix Data’s FVP. Pernix managed to build a framework that sits in the hypervisor, in the data path of the VMs. They provide a highly resilient caching layer, and will be able do both flash as well as memory caching in the near future. They support different types of storage connected with the upcoming release… If you ask me, they should be in the right position to slap additional data services like deduplication / compression / encryption / replication on top of it. I am just speculating here, and I don’t know the PernixData roadmap so who knows…

Something completely different is EMC’s ViPR (read Chad’s excellent post on ViPR) and although they may not entirely fit the picture I drew today they are aiming to be that layer in between you and your storage devices and abstract it all for you and allow for a single API to ease automation and do this “end to end” including the storage networks in between. If they would extend this to allow for certain data services to sit in a different layer then they would pretty much be there.

Last but not least Atlantis USX. Although Atlantis is a virtual appliance and as such a different implementation than Virtual San and FVP, they did manage to build a platform that basically does everything I mentioned in my original article. One thing it doesn’t directly solve is the management of the physical storage devices, but today neither does FVP or Virtual SAN (well to a certain extend VSAN does…) But I am confident that this will change when Virtual Volumes is introduced as Atlantis should be able to leverage Virtual Volumes for those purposes.

Some may say, well what about VMware’s Virsto? Indeed, Virsto would also fit the picture but the end of availability was announced not too long ago. However, it has been hinted at multiple times that Virsto technology will be integrated in to other products over time.

Although by now “Software Defined Storage” is seen as a marketing bingo buzzword the world of storage is definitely changing. The question now is I guess, are you ready to change as well?

Virtual SAN (related) PEX Updates

I am at VMware Partner Exchange this week and there and figured I would share some of the Virtual SAN related updates.

  • 6th of March their is an online Virtual SAN event with Pat Gelsinger, Ben Fathi and John Gilmartin… Make sure to register for it!
  • Ben Fathi (VMware CTO) stated that VSAN will be GA in Q1, more news in the upcoming weeks
  • Maximum cluster size has been increased from 8 (beta) to 16 according to Ben Fathi, VMware VSAN engineering team is ahead of schedule!
  • VSAN has linear scalability, close to a million IOPS with 16 hosts in a cluster (100% read, 4K blocks). Mixed IOPS close to half a million. All of this with less than 10% CPU/Memory overhead. That is impressive if you ask me. Yeah yeah I know, numbers like these are just a part of the overall story… still it is nice to see that this kind of performance numbers can be achieved with VSAN.
  • I noticed a tweet Chetan Venkatesh and it looks like Atlantis ILIO USX (in memory storage solution) has been tested on top of VSAN and they were capable of hitting 120K IOPS using 3 hosts, WOW. There is a white paper on this topic to be found here, interesting read.
  • It was also reinstated that customers who sign up and download the beta will get a 20% discount on the first purchase of 10 VSAN licenses or more!
  • Several hardware vendors announced support for VSAN, a nice short summary by Alberto to be found here.

Operational simplicity through Flash

A couple of weeks back I had to honor to be one of the panel members at the opening of the Pure Storage office in the Benelux. The topic of course was flash, and the primary discussion around the benefits. The next day I tweeted a quote of one of the answers I gave during the session which was picked up by Frank Denneman in one of his articles, this is the quote:

David Owen responded to my tweet saying that many performance acceleration platforms introduce an additional layer of complexity, and Frank followed up on that in his article. However this is not what my quote was referring to. First of all, I don’t agree with David that many performance acceleration solutions increase operational complexity. However, I do agree that they don’t always make life a whole lot easier either.

I guess it is fair to say that performance acceleration solutions (hyper-visor based SSD caching) are not designed to replace your storage architecture or to simplify it. They are designed to enhance it, to boost the performance. During the Pure Storage panel sessions I was talking about how flash changed the world of storage, or better said is changing the world of storage. When you purchased a storage array in the two decades it would come with days worth of consultancy. Two days typically being the minimum and in some cases a week or even more. (Depending on the size, and the different functionality used etc.) And that was just the install / configure part. It also required the administrators to be trained, in some cases (not uncommon) multiple five-day courses. This says something about the complexity of these systems.

The complexity however was not introduced by storage vendors just because they wanted to sell extra consultancy hours. It was simply the result of how the systems were architected. This by itself being the result of a major big constraint: magnetic disks. But the world is changing, primarily because a new type of storage was introduced; Flash!

Flash allowed storage companies to re-think their architecture, probably fair to state that the this was kickstarted by the startups out there who took flash and saw this as their opportunity to innovate. Innovationg by removing complixity. Removing (front-end) complexity by flattening their architecture.

Complex constructs to improve performance are no longer required as (depending on which type you use) a single flash disk delivers more than a 1000 magnetic disks typically do. Even when it comes to resiliency, most new storage systems introduced different types of solutions to mitigate (disk) failures. No longer is a 5-day training course required to manage your storage systems. No longer do you need weeks of consultancy just to install/configure your storage environment. In essence, flash removed a lot of the burden that was placed on customers. That is the huge benefit of flash, and that is what I was referring to with my tweet.

One thing left to say: Go Flash!