A couple of months ago I wrote this article about a future feature that would enable HA to recover from a Split Brain scenario. vSphere 4.0 Update 2 recently was released but the release notes or documentation did not mention this new feature.

I had never noticed this until I was having a discussion around this feature with one of my colleagues. I asked our HA Product Manager and one of our developers and it appears that this mysteriously has slipped the release notes. As I personally believe that this is a very important feature of HA I wanted to rehash some of the info stated in that article. I did rewrite it slightly though. Here we go:

One of the most common issues experienced in an iSCSI/NFS environment with VMware HA pre vSphere 4.0 Update 2 is a split brain situation.

First let me explain what a split brain scenario is, lets start with describing the situation which is most commonly encountered:

  • 4 Hosts
  • iSCSI / NFS based storage
  • Isolation response: leave powered on

When one of the hosts is completely isolated, including the Storage Network, the following will happen:

  1. Host ESX001 is completely isolated including the storage network(remember iSCSI/NFS based storage!) but the VMs will not be powered off because the isolation response is set to “leave powered on”.
  2. After 15 seconds the remaining, non isolated, hosts will try to restart the VMs.
  3. Because of the fact that the iSCSI/NFS network is also isolated the lock on the VMDK will time out and the remaining hosts will be able to boot up the VMs.
  4. When ESX001 returns from isolation it will still have the VMX Processes running in memory and this is when you will see a “ping-pong” effect within vCenter, in other words VMs flipping back and forth between ESX001 and any of the other hosts.

As of version 4.0 Update 2 ESX(i) detects that the lock on the VMDK has been lost and issues a question which is automatically answered. The VM will be powered off to recover from the split-brain scenario and to avoid the ping-pong effect. Please note that HA will generate an event for this auto-answer which is viewable within vCenter.

Don’t you just love VMware HA!