vSAN

600GB write buffer limit for VSAN?

Duncan Epping · May 17, 2016 ·

I get this question on a regular basis and it has been explained many many times, I figured I would dedicate a blog to it. Now, Cormac has written a very lengthy blog on the topic and I am not going to repeat it, I will simply point you to the math he has provided around it. I do however want to provide a quick summary:

When you have an all-flash VSAN configuration the current write buffer limit is 600GB. (only for all-flash) As a result many seem to think that when a 800GB device is being used for the write buffer that 200GB will go unused. This simply is not the case. We have a rule of thumb of 10% cache to capacity ratio. This rule of thumb has been developed with both performance and endurance in mind as described by Cormac in the link above. The 200GB that is above the 600GB limit of the write buffer is actively used by the flash device for endurance. Note that an SSD usually is over-provisioned by default, most of them have extra cells for endurance and write performance. Which makes the experience more predictable and at the same time more reliable, the same applies in this case with the Virtual SAN write buffer.

The image at the top right side shows how this works. This SSD has 800GB as advertised capacity. The “write buffer” is limited to 600GB however the white space is considered “dynamic over provisioning” capacity as it will be actively used by the SSD automatically (SSDs do this by default). Then there is an additional x % of over provisioning by default on all SSDs, which in the example is 28% (typical for enterprise grade) and even after that there usually is an extra 7% for garbage collection and other SSD internals. If you want to know more about why this is and how this works, Seagate has a nice blog.

So lets recap, as a consumer/admin the 600GB write buffer limit should not be a concern. Although the write buffer is limited in terms of buffer capacity, the flash cells will not go unused and the rule of thumb as such remains unchanged: 10% cache to capacity ratio. Lets hope this puts this (non) discussion finally to rest.

How HA handles a VSAN Stretched Cluster Site Partition

Duncan Epping · Apr 25, 2016 ·

Over the past couple of weeks I have had some interesting questions from folks about different VSAN Stretched failure scenarios, in particular what happens during a VSAN Stretched Cluster site partition. These questions were in particular about site partitions and how HA and VSAN know which VMs to fail-over and which VMs to power-off. There are a couple of things I like to clarify. First lets start with a diagram that sketches a stretched scenario. In the diagram below you see 3 sites. Two which are “data” sites and one which is used for the “witness”. This is a standard VSAN Stretched configuration.

The typical question now is, what happens when Site 1 is isolated from Site 2 and from the Witness Site? (While the Witness and Site 2 remain connected.) Is the isolation response triggered in Site 1? What happens to the workloads in Site 1? Are the workloads restarted in Site 2? If so, how does Site 2 know that the VMs in Site 1 are powered off? All very valid questions if you ask me, and if you read the vSphere HA deepdive on this website closely and letter for letter you will find all the answers in there, but lets make it a bit easier for those who don’t have the time.

First of all, all the VMs running in Site 1 will be powered off. Let is be clear that this is not done by vSphere HA, this is not the result of an “isolation” as technically the hosts are not isolated but partitioned. The VMs are killed by a VSAN mechanism and they are killed because the VMs have no access to any of the components any longer. (Local components are not accessible as there is no quorum.) You can disable this mechanism by the way, although I discourage you from doing so, through the advanced host settings. Set the advanced host setting called VSAN.AutoTerminateGhostVm to 0.

In the second site a new HA master node will be elected. That master node will validate which VMs are supposed to be powered on, it knows this through the “protectedlist”. The VMs that were on Site 1 will be missing, they are on the list, but not powered on within this partition… As this partition has ownership of the components (quorum) it will now be capable of powering on those VMs.

Finally, how do the hosts in Partition 2 know that the VMs in Partition 1 have been powered off? Well they don’t. However, Partition 2 has quorum (Quorum meaning that is has the majority of the votes / components (2 our of 3) and as such ownership and they do know that this means it is safe to power-on those VMs as the VMs in Partition 1 will be killed by the VSAN mechanism.

I hope that helps. For more details, make sure to read the clustering deepdive, which can be downloaded here for free.

How is Virtual SAN doing? 3500 customers reached!

Duncan Epping · Apr 21, 2016 ·

Are you wondering how Virtual SAN is doing? The recent earnings announcement revealed that… Virtual SAN is doing GREAT! Over 3500 customers so far (21 months after the release!) and 200% Year over Year growth. I loved how Pat Gelsinger described Virtual SAN: “VMware’s simple enterprise grade native storage for vSphere”. It doesn’t get more accurate and to the point than that, and that is how people should look at it. vSphere native storage, it just works. Just a couple of things I wanted to grab from the earnings call (transcript here) that I think stood out with regards to VSAN:

and I – having been three years at EMC as a storage company, part of it is it just takes a while to get a storage product mature, right, and that – we have crossed the two-year cycle on VSAN now. The 6.2 release, as I would say, checks all the boxes with regard to key features, capabilities and so on, and we are, I’ll say right on schedule, right, we’re seeing the inflection point on that business, and the 6.2 release really hit the mark in the marketplace very well.

I’d say we’re clearly now seen as number one in a hyper-converged infrastructure space, and that software category we think is going to continue to really emerge as a powerful trend in the industry.

I think Zane mentioned a large financial services company. We had a large EMEA retailer, a large consumer goods manufacturer, a large equipment engines company, and each one of these is really demonstrating the power of the technology.

We also had good transactional bookings as well, so it wasn’t just in big deals but also transactional performance was good. So the channel participation is increasing here.

So we really left Q1 feeling really good about this area, and I’m quite bullish about its growth potential through the year and 2017 and beyond.

I think I don’t need to add anything other than… Go VSAN!

Some recent updates to HA deepdive

Duncan Epping · Apr 15, 2016 ·

Just a short post to point out that I updated the VVol section in the HA Deepdive. If you downloaded it, make sure to download the latest version. Note that I have added a version number to the intro and a changelog at the end so you can see what changes. Also, I recommend subscribing to it, as I plan to do some more updates in the upcoming months. For the update I’ve been playing with a Nimble (virtual) array all day today and it allowed me to create some cool screenshots of how HA works in a VVol environment. I was also seriously impressed by how easy it was to setup the Nimble (virtual) array and how simple VVol was to configure for them. Not just that, but the number of policy options Nimble exposes, I was amazed. Below is just an example of some of the things you can configure!

The screenshot below shows the Virtual Volumes created for a VM, this is the view from a Storage perspective:

Can I still provision VMs when a VSAN Stretched Cluster site has failed?

Duncan Epping · Apr 13, 2016 ·

A question was asked internally if you can still provision VMs when a site has failed in a VSAN stretched cluster environment. In a regular VSAN environment when you don’t have sufficient fault domains you cannot provision new VMs, unless you explicitly enable Force Provisioning, which most people do not have enabled. In a VSAN stretched cluster environment this behaviour is different. In my case I tested what would happen if the witness appliance would be gone. I had already created a VM before I failed the witness appliance, and I powered it on after I failed the witness, just to see if that worked. Well that worked, great, and if you look at the VM at a component level you can see that the witness component is missing.

Next test would be to create a new VM while the Witness Appliance is down. That also worked, although I am notified by vCenter during the provisioning process that there are less fault domain than expected as shown in the below screenshot. This is the difference with a normal VSAN environment, here we actually do allow you to provision new workloads, mainly because the site could be down for a longer period of time.

Now next step would be to power on the just created VM and then look at the components. The power on works without any issues and as shown below, the VM is created in the Preferred site with a single component. As soon though as the Witness recovers the the remaining components are created and synced.

Good to see that provisioning and power-on actually does work and that behaviour for this specific use-case was changed. If you want to know more about VSAN stretched clusters, there are a bunch of articles on it to be found here. And there is a deepdive white paper also available here.