metro cluster

Can I still provision VMs when a vSAN Stretched Cluster site has failed? Part II

Duncan Epping · Dec 18, 2019 ·

3 years ago I wrote the following post: Can I still provision VMs when a VSAN Stretched Cluster site has failed? Last week I received a question on this subject, and although officially I am not supposed to work on vSAN in the upcoming three months I figured I could test this in the evening easily within 30 minutes. The question was simple, in my blog I described the failure of the Witness Host, what if a single host fails in one of the two “data” fault domains? What if I want to create a snapshot for instance, will this still work?

So here’s what I tested:

vSAN Stretched Cluster
4+4+1 configuration
- Meaning, 4 hosts in each “data site” and a witness host, for a total of 8 hosts in my vSAN cluster
Create a VM with cross-site protection and RAID-5 within the location

So I first failed a host in one of the two data sites. When I fail the host, the following is what happens when I create a VM with RAID-1 across sites and RAID-5 within a site:

Without “Force Provisioning” enabled the creation of the VM fails
When “Force Provisioning” is enabled the creation of the VM succeeds, the VM is created with a RAID-0 within 1 location

Okay, so this sounds similar to the originally described scenario, in my 2016 blog post, where I failed the witness. vSAN will create a RAID-0 configuration for the VM. When the host returns for duty the RAID-1 across locations and RAID-5 within each location is then automatically created. On top of that, you can snapshot VMs in this scenario, the snapshots will also be created as RAID-0. One thing to mind is that I would recommend removing “force provisioning” from the policy after the failure has been resolved! Below is a screenshot of the component layout of the scenario by the way.

I also retried the witness host down scenario, and in that case, you do not need to use the “force provisioning” option. One more thing to note. The above will only happen when you create a RAID configuration which is impossible to create as a result of the failure. If 1 host fails in a 4+4+1 stretched cluster you would like to create a RAID-1 across sites and a RAID-1 within sites then the VM would be created with the requested RAID configuration, which is demonstrated in the screenshot below.

New whitepaper available: vSphere Metro Storage Cluster Recommended Practices (6.5 update)

Duncan Epping · Oct 24, 2017 ·

I had many requests for an updated version of this paper, so the past couple of weeks I have been working hard. The paper was outdated as it was last updated around the vSphere 6.0 timeframe, and it was only a minor update. I looked at every single section and added in new statements and guidance around vSphere HA Restart Priority for instance. So for those running a vSphere Metro Storage Cluster / Stretched Cluster of some kind, please read the brand new vSphere Metro Storage Cluster Recommended Practices (6.5 update) white paper.

It is available on storagehub.vmware.com in PDF and for reading within your browser. Any questions and comments, please do not hesitate to leave them here.

VMs not getting killed after vMSC partition has lifted

Duncan Epping · Jan 12, 2017 ·

I was talking to a VMware partner over the past couple of weeks about challenges they had in a new vSphere Metro Storage Cluster (vMSC) environment. In their particular case they simulated a site partition. During the site partition three things were expected to happen:

VMs that were impacted by APD (or PDL) should be killed by vSphere HA Component Protection
- If HA Component Protection does not work, vSphere should kill the VMs when the partition is lifted
VMs should be restarted by vSphere HA

The problems faced were two-fold, VMs were restarted by vSphere HA, however:

vSphere HA Component Protection did not kill the VMs
When the partition was lifted vSphere did not kill the VMs which had lost the lock to the datastore either

It took a while before we figured out what was going on, at least for one of the problems. Lets start with the second problem first, why aren’t the VMs killed when the partition is lifted? vSphere should do this automatically. Well vSphere does this automatically, but only when there’s a Guest Operating system installed and an I/O is issued. As soon as an I/O is issued by the VM then vSphere will notice the lock to the disk is lost and obtained by another host and kill the VM. If you have an “empty VM” then this won’t happen as there will not be any I/O to the disk. (I’ve filed a feature request to kill VMs as well even without disk I/O or without a disk.) So how do you solve this? If you do any type of vSphere HA testing (with or without vMSC) make sure to install a guest OS so it resembles real life.

Now back to the first problem. The fact that vSphere HA Component Protection does not kick in is still being debated, but I think there is a very specific reason for it. vSphere HA Component Protection is a feature that kills VMs on a host so they can be restarted when an APD or a PDL scenario has occurred. However, it will only do this when it is:

Certain the VM can be restarted on the other side (conservative setting)
There are healthy hosts in the other partition, or we don’t know (Aggressive)

First one is clear I guess (more info about this here), but what does the second one mean? Well basically there are three options:

Availability of healthy host: Yes >> Terminate
Availability of healthy host: No >> Don’t Terminate
Availability of healthy host: Unknown >> Terminate

So in the case you where you have VMCP set to “Aggressively” failover VMs, it will only do so when it knows hosts are available in the other site or when it does not know the state of the hosts in the other site. If for whatever reason the hosts are deemed as unhealthy the answer to the question if there are healthy hosts available or not will be “No”, and as such the VMs will not be killed by VMCP. The question remains, why are these hosts reported as “unhealthy” in this partition scenario, that is something we are now trying to figure out. Potentially it could be caused by misconfigured Heartbeat Datastores, but this is still something to be confirmed. If I know more, I will update this article.

Just received confirmation from development, heartbeat datastores need to be available on both sites for vSphere HA to identify this scenario correctly. If there are no heartbeat datastores available on both sites then it could happen that no hosts are marked as healthy, which means that VMCP will not instantly kill those VMs when the APD has occured.

vSphere Metro Storage Cluster with vSphere 6.0 paper released

Duncan Epping · Jul 8, 2015 ·

I’d already blogged about this on the VMware blog, but I figured I would share it here as well. The vSphere Metro Storage Cluster with vSphere 6.0 white paper has been released. I worked on this paper together with my friend Lee Dilworth, it is an updated version of the paper we did in 2012. It contains all of the new best practices for vSphere 6.0 when it comes to vSphere Metro Storage Cluster implementations, so if you are looking to implement one or upgrade an existing environment make sure to read it!

VMware vSphere Metro Storage Cluster Recommended Practices

VMware vSphere Metro Storage Cluster (vMSC) is a specific configuration within the VMware Hardware Compatibility List (HCL). These configurations are commonly referred to as stretched storage clusters or metro storage clusters and are implemented in environments where disaster and downtime avoidance is a key requirement. This best practices document was developed to provide additional insight and information for operation of a vMSC infrastructure in conjunction with VMware vSphere. This paper explains how vSphere handles specific failure scenarios, and it discusses various design considerations and operational procedures. For detailed information about storage implementations, refer to documentation provided by the appropriate VMware storage partner.

vSphere HA respecting VM-Host should rules?

Duncan Epping · Mar 5, 2015 ·

A long time ago I authored this white paper around stretched clusters. During out testing the one thing where we felt HA was lacking was the fact that it would not respect VM-Host should rules. So if you had these configured in a cluster and a host would fail then VMs could be restarted on ANY given host in the cluster. The first time that DRS would then run it would move the VMs back to where they belong according to the configured VM-Host should rules.

I guess one of the reasons for this was the fact that originally the affinity and anti-affinity rules were designed to be DRS rules. Over time I guess we realized that these are not DRS rules but rather Cluster rules. Based on the findings we did when authoring the white paper we filed a bunch of feature requests and one of them just made vSphere 6.0. As of vSphere 6.0 it is possible to have vSphere HA respecting VM-Host should rules through the use of an advanced setting called “das.respectVmHostSoftAffinityRules”.

When “das.respectVmHostSoftAffinityRules” is configured then vSphere HA will try to respect the rule when it can. So if there are any hosts in the cluster which belong to the same VM-Host group then HA will restart the respective VM on that host. Of course as this is a “should rule” HA has the ability to ignore the rule when needed. You can imagine that there could be a scenario where none of the hosts in the VM-Host should rule is available, in that case HA will restart the VM on any other host in the cluster. Useful? Yes, I think so!