BC-DR

vSphere 4 U2 and recovering from HA Split Brain

Duncan Epping · Jul 2, 2010 ·

A couple of months ago I wrote this article about a future feature that would enable HA to recover from a Split Brain scenario. vSphere 4.0 Update 2 recently was released but the release notes or documentation did not mention this new feature.

I had never noticed this until I was having a discussion around this feature with one of my colleagues. I asked our HA Product Manager and one of our developers and it appears that this mysteriously has slipped the release notes. As I personally believe that this is a very important feature of HA I wanted to rehash some of the info stated in that article. I did rewrite it slightly though. Here we go:

One of the most common issues experienced in an iSCSI/NFS environment with VMware HA pre vSphere 4.0 Update 2 is a split brain situation.

First let me explain what a split brain scenario is, lets start with describing the situation which is most commonly encountered:

4 Hosts
iSCSI / NFS based storage
Isolation response: leave powered on

When one of the hosts is completely isolated, including the Storage Network, the following will happen:

Host ESX001 is completely isolated including the storage network(remember iSCSI/NFS based storage!) but the VMs will not be powered off because the isolation response is set to “leave powered on”.
After 15 seconds the remaining, non isolated, hosts will try to restart the VMs.
Because of the fact that the iSCSI/NFS network is also isolated the lock on the VMDK will time out and the remaining hosts will be able to boot up the VMs.
When ESX001 returns from isolation it will still have the VMX Processes running in memory and this is when you will see a “ping-pong” effect within vCenter, in other words VMs flipping back and forth between ESX001 and any of the other hosts.

As of version 4.0 Update 2 ESX(i) detects that the lock on the VMDK has been lost and issues a question which is automatically answered. The VM will be powered off to recover from the split-brain scenario and to avoid the ping-pong effect. Please note that HA will generate an event for this auto-answer which is viewable within vCenter.

Don’t you just love VMware HA!

How does “das.maxvmrestartcount” work?

Duncan Epping · Jun 30, 2010 ·

The amount of retries is configurable as of vCenter 2.5 U4 with the advanced option “das.maxvmrestartcount”. My colleague Hugo Strydom wrote about this a while ago and after a short discussion with one of our developers I realised Hugo’s article was not 100% correct. The default value is 5. Pre vCenter 2.5 U4 HA would keep retrying forever which could lead to serious problems as described in KB article 1009625 where multiple virtual machines would be registered on multiple hosts simultaneously leading to a confusing and inconsistent state. (http://kb.vmware.com/kb/1009625)

Important to note is that HA will try to start the virtual machine one of your hosts in the affected cluster; if this is unsuccessful on that host the restart count will be increased by 1. The first restart attempt will than occur after two minutes. If that one fails the next will occur after 4 minutes, and if that one fails the following will occur after 8 minutes until the “das.maxvmrestartcount” has been reached.

To make it more clear look at the following:

T+0 – Restart
T+2 – Restart retry 1
T+4 – Restart retry 2
T+8 – Restart retry 3
T+8 – Restart retry 4
T+8 – Restart retry 5

In other words, it could take up to 30 minutes before a successful restart has been initiated when using the default of “5” restarts max. If you increase that number, each following will also be “T+8” again.

HA: Max amount of host failures?

Duncan Epping · Jun 18, 2010 ·

A colleague had a question around the maximum amount of host failures HA could take. The availability guide states the following:

The maximum Configured Failover Capacity that you can set is four. Each cluster has up to five primary hosts and if all fail simultaneously, failover of all hosts might not be successful.

However, when you select the “Percentage” admission control policy you can set it to 50% even when you have 32 hosts in a cluster. That means that the amount of failover capacity being reserved equals 16 hosts.

Although this is fully supported but there is a caveat of course. The amount of primary nodes is still limited to five. Even if you have the ability to reserve over 5 hosts as spare capacity that does not guarantee a restart. If, for what ever reason, half of your 32 hosts cluster fails and those 5 primaries happen to be part of the failed hosts your VMs will not restart. (One of the primary nodes coordinates the fail-over!) Although the “percentage” option enables you to save additional spare capacity there’s always the chance all primaries fail.

All in all, I still believe the Percentage admission control policy provides you more flexibility than any other admission control policy.

Which host is selected for an HA initiated restart?

Duncan Epping · Jun 16, 2010 ·

Got asked the following question today and thought it was valuable for everyone to know the answer to this:

How is a host selected for VM placement when HA restarts VMs from a failed host?

It’s actually a really simple mechanism. HA keeps track of the unreserved capacity of each host of the cluster. When a fail-over needs to occur the hosts are ordered. The host with the highest amount of unreserved capacity being the first option. Now to make it absolutely crystal clear, HA keeps track of the unreserved capacity and it is not DRS which does this. HA works completely independent of vCenter and as we all know DRS is part of vCenter. HA also works when DRS is disabled or unlicensed!

Now one thing to note is that HA will also verify if the host is compatible with the VM or not. What this means is that HA will verify if the VMs network is available on the target host and if the datastore is available on the target hosts. If both are the case a restart will be initiated on that host. To summarize:

Order available host based on unreserved capacity
Check compatibility (VM Network / Datastore)
Boot up!

VM Monitoring (aka VM HA) heartbeat

Duncan Epping · Jun 4, 2010 ·

I got a question around VM Monitoring (aka virtual machine level HA) this week. A customer wanted to test if VM Monitoring worked and as such disabled the NIC of the virtual machine and waited for 30 seconds for the VM Monitoring response to kick in…. nothing happened.

VM Monitoring restarts individual virtual machines when needed. VM monitoring uses a similar concept as HA, heartbeats. If heartbeats, and in this case VMware Tools heartbeats are not received for a specific amount of time the virtual machine will be rebooted. An example of when this will happen for instance is when a Windows virtual machine shows a BSOD.

The big question of course was why didn’t this trigger a response?

The answer is simple: The VMware Tools heartbeat does not use the virtual machine NIC. This heartbeat is “caught” by hostd and passed on to vCenter. vCenter uses this to show those “green/yellow/red” alarm dots. The same heartbeat is used by VM Monitoring to detect the failure of a virtual machine. Even without any NIC attached to your virtual machine these heartbeats will still be received.

One thing to keep in mind though is that when heartbeats are no longer received, by default sent out every second, VM Monitoring will check if there is any Network or Storage I/O to avoid false positives.

Question for you guys! One thing that I always wondered is how many people use VM Monitoring? And if you use it, do you use it on all VMs in every cluster?