Virtual SAN (related) PEX Updates

I am at VMware Partner Exchange this week and there and figured I would share some of the Virtual SAN related updates.

  • 6th of March their is an online Virtual SAN event with Pat Gelsinger, Ben Fathi and John Gilmartin… Make sure to register for it!
  • Ben Fathi (VMware CTO) stated that VSAN will be GA in Q1, more news in the upcoming weeks
  • Maximum cluster size has been increased from 8 (beta) to 16 according to Ben Fathi, VMware VSAN engineering team is ahead of schedule!
  • VSAN has linear scalability, close to a million IOPS with 16 hosts in a cluster (100% read, 4K blocks). Mixed IOPS close to half a million. All of this with less than 10% CPU/Memory overhead. That is impressive if you ask me. Yeah yeah I know, numbers like these are just a part of the overall story… still it is nice to see that this kind of performance numbers can be achieved with VSAN.
  • I noticed a tweet Chetan Venkatesh and it looks like Atlantis ILIO USX (in memory storage solution) has been tested on top of VSAN and they were capable of hitting 120K IOPS using 3 hosts, WOW. There is a white paper on this topic to be found here, interesting read.
  • It was also reinstated that customers who sign up and download the beta will get a 20% discount on the first purchase of 10 VSAN licenses or more!
  • Several hardware vendors announced support for VSAN, a nice short summary by Alberto to be found here.

Operational simplicity through Flash

A couple of weeks back I had to honor to be one of the panel members at the opening of the Pure Storage office in the Benelux. The topic of course was flash, and the primary discussion around the benefits. The next day I tweeted a quote of one of the answers I gave during the session which was picked up by Frank Denneman in one of his articles, this is the quote:

David Owen responded to my tweet saying that many performance acceleration platforms introduce an additional layer of complexity, and Frank followed up on that in his article. However this is not what my quote was referring to. First of all, I don’t agree with David that many performance acceleration solutions increase operational complexity. However, I do agree that they don’t always make life a whole lot easier either.

I guess it is fair to say that performance acceleration solutions (hyper-visor based SSD caching) are not designed to replace your storage architecture or to simplify it. They are designed to enhance it, to boost the performance. During the Pure Storage panel sessions I was talking about how flash changed the world of storage, or better said is changing the world of storage. When you purchased a storage array in the two decades it would come with days worth of consultancy. Two days typically being the minimum and in some cases a week or even more. (Depending on the size, and the different functionality used etc.) And that was just the install / configure part. It also required the administrators to be trained, in some cases (not uncommon) multiple five-day courses. This says something about the complexity of these systems.

The complexity however was not introduced by storage vendors just because they wanted to sell extra consultancy hours. It was simply the result of how the systems were architected. This by itself being the result of a major big constraint: magnetic disks. But the world is changing, primarily because a new type of storage was introduced; Flash!

Flash allowed storage companies to re-think their architecture, probably fair to state that the this was kickstarted by the startups out there who took flash and saw this as their opportunity to innovate. Innovationg by removing complixity. Removing (front-end) complexity by flattening their architecture.

Complex constructs to improve performance are no longer required as (depending on which type you use) a single flash disk delivers more than a 1000 magnetic disks typically do. Even when it comes to resiliency, most new storage systems introduced different types of solutions to mitigate (disk) failures. No longer is a 5-day training course required to manage your storage systems. No longer do you need weeks of consultancy just to install/configure your storage environment. In essence, flash removed a lot of the burden that was placed on customers. That is the huge benefit of flash, and that is what I was referring to with my tweet.

One thing left to say: Go Flash!

How about an All Flash Virtual SAN?

Yeah that title got your attention right… For now it is just me writing about it and nothing has been announced or promised. At VMworld I believe it was Intel who demonstrated the possibilities in this space, an All Flash Virtual SAN. A couple of weeks back during my holiday someone pointed me to a couple of articles which were around SSD endurance. Typically these types of articles deal with the upper-end of the spectrum and as such are irrelevant to most of us, and some of the articles I have read in the past around endurance were disappointing to be honest.

TechReport.com however decided to look at consumer grade SSDs. We are talking about SSDs like the Intel 335, Samsung 840 series, Kingston Hyper-X and the Corsair Neutron. All of the SSDs used had a capacity of around 250GB and are priced anywhere between $175 and $275. Now if you look at the guarantees given in terms of endurance, we are talking about anything ranging from “20GB of writes per day for the length of its three-year warranty” for the Intel (22TB in total) to three-year and 192TB in total for the Kingston, and anything in between for the other SSDs.

Tech Report had set their first checkpoint at 22TB. After running through a series of tests, which are described in the article, they compare the results between the various SSDs after 22TB writes. Great to see that all SSDs did what they are supposed to do and promised. All of them passed the 22TB mark without any issues. They had another checkpoint at the 200TB mark, which showed the first signs of weakness. As expected the lower end SSDs dropped out first. The next checkpoint was set at the 300TB mark, they also added an unpowered retention test to see how well they retain data when unplugged. So far impressive results, and a blog series I will follow with interest. The articles clearly show that from an endurance perspective the SSDs perform a lot better than most had assumed in the past years. It is fair to say that the consumer grade SSDs are up to the challenge.

Considering the low price points of these flash devices, I can see how an All Flash Virtual SAN solution would be possible leveraging these consumer grade SSDs as the capacity tier (reads) and using enterprise grade SSDs to provide write performance (write buffer). Hopefully we will start to see the capacity increase even further of these types of devices, today some of them go up to 500GB others up to 800GB, wouldn’t it be nice to have a 1TB (or more) version?

Anyway, I am excited and definitely planning on running some test with an all flash Virtual SAN solution in the future… What about you?

** 500TB blog update! **
** 600TB blog update! **

How to calculate what your Virtual SAN datastore size should be

I have had this question so many times I figured I would write an article about it, how to calculate what your Virtual SAN datastore size should be? Ultimate this determines which kind of server hardware you can use, which disk controller you need and which disks… So it is important that you get it right. I know the VMware Technical Marketing team is developing collateral around this topic, when that has been published I will add a link here. Lets start with a quote by Christian Dickmann one of our engineers as it is the foundation of this article:

In Virtual SAN your whole cluster acts as a hot-spare

Personally I like to work top-down, meaning that I start with an average for virtual machines or a total combined number. Lets take an example to go through the exercise, makes it a bit easier to digest.

Lets assume the average VM disk size is 50GB. On average the VMs have 4GB of memory provisioned. And we have 100 virtual machines in total that we want to run on a 4 host cluster. Based on that info the formula would look something like this:

(total number of VMs * average VM size) + (total number of VMs * average VM memory size) = total capacity required

In our case that would be:

(100 * 50GB) + (100 * 4GB) = 5400 GB

So that is it? Well not really, like every storage / file system there is some overhead and we will need to take the “failures to tolerate” in to account. If I set my “failures to tolerate” to 1 than I would have 2 copies of my VMs, this means I need 5400 GB * 2 = . Personally I also add an additional 10% in disk capacity to ensure we have room for things like: meta data, log files, vmx files and some small snapshots when required. Note that VSAN by default provisions all VMDKs as thin objects (note that swap files are thick, Cormac explained that here), so there should be room available regardless. Better safe than sorry though. This means that 10800 GB actually becomes 11880 GB. I prefer to round this up to 12TB. The formula I have been using thus looks as follows:

(((Number of VMs * Avg VM size) + (Number of VMs * Avg mem size)) * FTT+1) + 10%

Now the next step is to see how you divide that across your hosts. I mentioned we would have 4 hosts in our cluster. We have two options, we create a cluster that can re-protect itself after a full host failure or we create cluster that cannot. Just to clarify, in order to have 1 host of spare capacity available we will need to divide the total capacity by 3 instead of 4. Lets look at those two options, and what the impact is:

  • 12TB / 3 hosts = 4TB per host (for each of the 4 hosts)
    • Allows you re-protect (sync/mirror) all virtual machine objects even when you lose a full host
    • All virtual machines will maintain availability levels when doing maintenance
    • Requires an additional 1TB per host!
  • 12TB / 4 hosts = 3TB per host (for each of the 4 hosts)
    • If all disk space is consumed, when a host fails virtual machines cannot be “re-protected” as there would be no capacity to sync/mirror the objects again
    • When entering maintenance mode data availability cannot be maintained as there would be no room to sync/mirror the objects to another disk

Now if you look at the numbers, we are talking about an additional 1TB per host. With 4 hosts, and lets assume we are using 2.5″ SAS 900GB Hitachi drives that would be 4 additional drives, at a cost of around 1000 per drive. When using 3.5″ SATA drives the cost would be a lot lower even. Although this is just a number I found on the internet it does illustrate that the cost of providing additional availability could be small. Prices could differ though depending on the server brand used. But even at double the cost, I would go for the additional drive and as such additional “hot spare capacity”.

To make life a bit easier I created a calculator. I hope this helps everyone who is looking at configuring hosts for their Virtual SAN based infrastructure.

VSAN VDI Benchmarking and Beta refresh!

I was reading this blog post on VSAN VDI Benchmarking today on Vroom, the VMware Performance blog. You see a lot of people doing synthetic tests (max iops with sequential reads) on all sorts of storage devices, but lately more and more vendors are doing these more “real world performance tests”. While reading this article about VDI benchmarking, and I suggest you check out all parts (part 1, part 2, part 3), there was one thing that stood out to me and that was the comparison between VSAN and an All Flash Array.

The following quotes show the strength of VSAN if you ask me:

we see that VSAN can consolidate 677 heavy users (VDImark) for 7-node and 767 heavy users for 8-node cluster. When compared to the all flash array, we don’t see more than 5% difference in the user consolidation.

Believe me when I say that 5% is not a lot. If you are actively looking at various solutions, I would highly recommend to include the “overhead costs” to your criteria list as depending on the solution chosen this could make a substantial difference. I have seen other solutions requiring a lot more resources. But what about response time, cause that is where the typical All Flash Array shines… ultra low latency, how about VSAN?

Similar to the user consolidation, the response time of Group-A operations in VSAN is similar to what we saw with the all flash array.

Both very interesting results if you ask me. Especially the < 5% in user consolidation is what stood out to me the most! Once again, for more details on these tests read the VDI Benchmarking blog part 1, part 2, part 3!

Beta Refresh

For those who are testing VSAN, there is a BETA refresh available as of today. This release has a fix for the AHCI driver issue… and it increases the disk group limit from 6 to 7. From a disk group perspective this will  come in handy as many servers have 8, 16 or 24 disk slots allowing you to do 7HHDs + 1 SSD per group. Also some additional RVC commands have been added in the storage policy space, I am sure they will come in handy!

Nice side affect of the number of HDDs going up is increase in max capacity:

(8 hosts * (5 diskgroups * 7 HDDs)) * Size of HDD = Total capacity

With 2 TB disks this would result in:

(8 * (5 * 7)) * 2TB = 560TB

Now keep on testing with VSAN and don't forget to report feedback through the community forums or your VMware rep.

Startup intro: Coho Data

Today a new startup is revealed named Coho Data, formerly known as Convergent.io. Coho Data was founded by Andrew Warfield, Keir Fraser and Ramana Jonnala. For those who care, they are backed by Andreessen Horowitz. Probably most known for the work they did at Citrix on Xenserver. What is it they introduced / revealed this week?

Coho Data introduces a new scale-out hybrid storage solution (NFS for VM workloads). With hybrid meaning a mix of SATA and SSD. This for obvious reasons, SATA bringing you capacity and flash providing you raw performance. Let me point out that Coho is not a hyperconverged solution, it is a full storage system.

What does it look like? It is a 2U box which holds 2 “MicroArrays” which each MicroArray having 2 processors, 2 x 10GbE NIC port and 2 PCIe INTEL 910 cards. Each 2u block provides you 39TB of capacity and ~180K IOPS (Random 80/20 read/write, 4K block size). Starting at $2.50 per GB, pre-dedupe & compression (which they of course offer). Couple of things I liked looking at their architecture, first and probably foremost the “scale-out” architecture, scale to infinity is what they say in a linear fashion. On top of that, it comes with an OpenFlow-enabled 10GbE switch to allow for ease of management and again scalability.

If you look closely at how they architected their hardware, they created these highspeed IO lanes: 10GbE NIC <–> CPU <–> PCIe Flash Unit. Each highway has its dedicated CPU, NIC Port, ad on top of that they PCIe Flash, allowing for optimal performance, efficiency and fine grained control. Nice touch if you ask me.

Another thing I really liked was their UI. You can really see they put a lot of thought in the user experience aspect by keeping things simple and presenting data in an easy understandable way. I wish every vendor did that. I mean, if you look at the screenshot below how simple does that look? Dead simple right!? I’ve seen some of the other screens, like for instance for creating a snapshot schedule… again same simplicity. Apparently, and I have not tested this but I will believe them on their word, they brought that simplicity all the way down to the “install / configure” part of things. Getting Coho Data up and running literally only takes 15 minutes.

What I also liked very much about the Coho Data solution is that Software-defined Networking (SDN) and Software-defined Storage (SDS) are tightly coupled. In other words, Soho configures the network for you… As just said, it takes 15 minutes to setup. Try creating the zoning / masking scheme for a storage system and a set of LUNs these days, even that takes more time then 15 – 20 minutes. There aren’t too many vendors combining SDN and SDS in a smart fashion today.

When they briefed me they gave me a short demo and Andy explained the scale-out architecture, during the demo it happened various times that I could draw a parallel between the VMware virtualization platform and their solution which made is easy for me to understand and relate to their solution. For instance, Soho Data offers what I would call DRS for Software-Defined Storage. If for whatever reasons defined policies are violated then Coho Data will balance the workload appropriately across the cluster. Just like DRS (and Storage DRS) does, Coho Data will do a risk/benefit analysis before initiating the move. I guess the logical question would be, well why would I want Coho to do this when VMware can also do this with Storage DRS? Well keep in mind that Storage DRS works “across datastores”, but as Coho presents a single datastore you need something that allows you to balance within.

I guess the question then remains what do they lack today? Well today as a 1.0 platform Coho doesn’t offer replication to outside of their own cluster. But considering they have snapshotting in place I suspect their architecture already caters for it, and it something they should be able to release fairly quickly. Another thing which is lacking today is a vSphere Web Client plugin, but then again if you look at their current UI and the simplicity of it I do wonder if there is any point in having one.

All in all, I have been impressed by these newcomers in the SDS space and I can’t wait to play around with their gear at some point!

Designing your hardware for Virtual SAN

Over the past couple of weeks I have been watching the VMware VSAN Community Forum with close interest and also twitter. One thing that struck me was the type of hardware people used for to test VSAN on. In many cases this is the type of hardware one would use at home, for their desktop. Now I can see why that happens, I mean something new / shiny and cool is released and everyone wants to play around with it, but not everyone has the budget to buy the right components… And as long as that is for “play” only that is fine, but lately I have also noticed that people are looking at building an ultra cheap storage solution for production, but guess what?

Virtual SAN reliability, performance and overall experience is determined by the sum of the parts

You say what? Not shocking right, but something that you will need to keep in mind when designing a hardware / software platform. Simple things can impact your success, first and foremost check the HCL, and think about components like:

  • Disk controller
  • SSD / PCIe Flash
  • Network cards
  • Magnetic Disks

Some thoughts around this, for instance the disk controller. You could leverage a 3Gb/s on-board controller, but when attaching lets say 5 disks to it and a high performance SSD do you think it can still cope or would a 6Gb/s PCIe disk controller be a better option? Or even leverage 12Gb/s that some controllers offer for SAS drives? Not only can this make a difference in terms of number of IOps you can drive, it can also make a difference in terms of latency! On top of that, there will be a difference in reliability…

I guess the next component is the SSD / Flash device, this one is hopefully obvious to each of you. But don’t let these performance tests you see on Tom’s or Anandtech fool you, there is more to an SSD then just sheer IOps. For instance durability, how many writes per day for X years life can your SSD handle? Some of the enterprise grades can handle 10 full writes or more per day for 5 years. You cannot compare that with some of the consumer grade drives out there, which obviously will be cheaper but also will wear out a lot faster! You don’t want to find yourself replacing SSDs every year at random times.

Of course network cards are a consideration when it comes to VSAN. Why? Well because I/O will more than likely hit the network. Personally, I would rule out 1GbE… Or you would need to go for multiple cards and ports per server, but even then I think 10GbE is the better option here. Most 10GbE are of a decent quality, but make sure to check the HCL and any recommendations around configuration.

And last but not least magnetic disks… Quality should always come first here. I guess this goes for all of the components, I mean you are not buying an empty storage array either and fill it up with random components right? Think about what your requirements are. Do you need 10k / 15k RPM, or does 7.2k suffice? SAS vs SATA vs NL-SATA? Also, keep in mind that performance comes at a cost (capacity typically). Another thing to realize, high capacity drives are great for… yes adding capacity indeed, but keep in mind that when IO needs to come from disk, the number of IOps you can drive and your latency will be determined by these disks. So if you are planning on increasing the “stripe width” then it might also be useful to factor this in when deciding which disks you are going to use.

I guess to put it differently, if you are serious about your environment and want to run a production workload then make sure you use quality parts! Reliability, performance and ultimately your experience will be determined by these parts.

<edit> Forgot to mention this, but soon there will be “Virtual SAN” ready nodes… This will make your life a lot easier I would say.

</edit>