VMware

vSphere 4 U2 and recovering from HA Split Brain

Duncan Epping · Jul 2, 2010 ·

A couple of months ago I wrote this article about a future feature that would enable HA to recover from a Split Brain scenario. vSphere 4.0 Update 2 recently was released but the release notes or documentation did not mention this new feature.

I had never noticed this until I was having a discussion around this feature with one of my colleagues. I asked our HA Product Manager and one of our developers and it appears that this mysteriously has slipped the release notes. As I personally believe that this is a very important feature of HA I wanted to rehash some of the info stated in that article. I did rewrite it slightly though. Here we go:

One of the most common issues experienced in an iSCSI/NFS environment with VMware HA pre vSphere 4.0 Update 2 is a split brain situation.

First let me explain what a split brain scenario is, lets start with describing the situation which is most commonly encountered:

4 Hosts
iSCSI / NFS based storage
Isolation response: leave powered on

When one of the hosts is completely isolated, including the Storage Network, the following will happen:

Host ESX001 is completely isolated including the storage network(remember iSCSI/NFS based storage!) but the VMs will not be powered off because the isolation response is set to “leave powered on”.
After 15 seconds the remaining, non isolated, hosts will try to restart the VMs.
Because of the fact that the iSCSI/NFS network is also isolated the lock on the VMDK will time out and the remaining hosts will be able to boot up the VMs.
When ESX001 returns from isolation it will still have the VMX Processes running in memory and this is when you will see a “ping-pong” effect within vCenter, in other words VMs flipping back and forth between ESX001 and any of the other hosts.

As of version 4.0 Update 2 ESX(i) detects that the lock on the VMDK has been lost and issues a question which is automatically answered. The VM will be powered off to recover from the split-brain scenario and to avoid the ping-pong effect. Please note that HA will generate an event for this auto-answer which is viewable within vCenter.

Don’t you just love VMware HA!

How does “das.maxvmrestartcount” work?

Duncan Epping · Jun 30, 2010 ·

The amount of retries is configurable as of vCenter 2.5 U4 with the advanced option “das.maxvmrestartcount”. My colleague Hugo Strydom wrote about this a while ago and after a short discussion with one of our developers I realised Hugo’s article was not 100% correct. The default value is 5. Pre vCenter 2.5 U4 HA would keep retrying forever which could lead to serious problems as described in KB article 1009625 where multiple virtual machines would be registered on multiple hosts simultaneously leading to a confusing and inconsistent state. (http://kb.vmware.com/kb/1009625)

Important to note is that HA will try to start the virtual machine one of your hosts in the affected cluster; if this is unsuccessful on that host the restart count will be increased by 1. The first restart attempt will than occur after two minutes. If that one fails the next will occur after 4 minutes, and if that one fails the following will occur after 8 minutes until the “das.maxvmrestartcount” has been reached.

To make it more clear look at the following:

T+0 – Restart
T+2 – Restart retry 1
T+4 – Restart retry 2
T+8 – Restart retry 3
T+8 – Restart retry 4
T+8 – Restart retry 5

In other words, it could take up to 30 minutes before a successful restart has been initiated when using the default of “5” restarts max. If you increase that number, each following will also be “T+8” again.

Scripts for “Proactive DRS/DPM”

Duncan Epping · Jun 22, 2010 ·

I never noticed this set of scripts to be honest but Anne Holler(VMware Employee) posted these about a year ago. What the scripts do is change various DRS/DPM settings to pro-actively manage your environment and change DRS and DPM behaviour based on expected workload.

Proactive DRS:

setDRSAggressive.pl
The script setDRSAggressive.pl sets various DRS operating parameters so that it will recommend rebalancing VMotions even when current VM demand does not make those moves appear worthwhile. As an example use case, if powerOnHosts.pl (see “Proactive DPM” posting) is used to trigger host power-ons at 8am before an expected steep increase in VM demand weekdays at 9am, setDRSAggressive.pl can also be scheduled to run at 8am to force rebalancing moves to the powered-on hosts.
setDRSDefault.pl
The script setDRSDefault.pl resets DRS’ operating parameters so that it resumes its normal behaviour. (Behaviour before using setDRSAggressive.pl)
setMaxMovesPerHost.pl
The script setMaxMovesPerHost.pl can be used to increase DRS’ limit on the number of VMotions it will recommend in each (default every 5 minutes) regular DRS invocation

Proactive DPM:

powerOnHosts.pl
The script powerOnHosts.pl changes cluster settings to engender
recommendations to power on all standby hosts and then to disable DPM so that those hosts are kept on while demand remains low.
enableDPM.pl
The script enableDPM.pl re-enables DPM to run in its normal reactive behavior. As an example use case, this script can be scheduled to run each weekday morning at (say) 10am (after full VM demand load is expected to be established) or at (say) 5pm (after full VM demand load is likely to diminish) to resume normal DPM operation.

I had multiple customers asking me if it was possible to schedule a change of the DRS and DPM configuration. My answer used to be yes you can script it but never managed to find a script until I bumped into these coincidentally today.

DRS Sub Cluster? vSphere 4.next

Duncan Epping · Jun 21, 2010 ·

On the community forums a question was asked around Campus Clusters and pinning VMs to a specific set of hosts. In vSphere 4.0 that’s currently not possible unfortunately and it definitely is a feature that many customers would want to use.

Banjot Chanana revealed during VMworld that it was an upcoming feature but did not go into much details. However on the community forums, thanks @lamw for point this out, Elisha just revealed the following:

Controls will be available in the upcoming vSphere 4.1 release to enable this behavior. You’ll be able to set “soft” (ie. preferential) or “hard” (ie. strict) rules associating a set of vms with a set of hosts. HA will respect the hard rules and only failover vms to the appropriate hosts.

Basically DRS Host Affinity rules which VMware HA adheres to. Can’t wait for the upcoming vSphere version to be released and to figure out how all these nice “little” enhancements change our designs.

HA: Max amount of host failures?

Duncan Epping · Jun 18, 2010 ·

A colleague had a question around the maximum amount of host failures HA could take. The availability guide states the following:

The maximum Configured Failover Capacity that you can set is four. Each cluster has up to five primary hosts and if all fail simultaneously, failover of all hosts might not be successful.

However, when you select the “Percentage” admission control policy you can set it to 50% even when you have 32 hosts in a cluster. That means that the amount of failover capacity being reserved equals 16 hosts.

Although this is fully supported but there is a caveat of course. The amount of primary nodes is still limited to five. Even if you have the ability to reserve over 5 hosts as spare capacity that does not guarantee a restart. If, for what ever reason, half of your 32 hosts cluster fails and those 5 primaries happen to be part of the failed hosts your VMs will not restart. (One of the primary nodes coordinates the fail-over!) Although the “percentage” option enables you to save additional spare capacity there’s always the chance all primaries fail.

All in all, I still believe the Percentage admission control policy provides you more flexibility than any other admission control policy.