I had a question today around what the vSphere HA option advanced setting das.maskCleanShutdownEnabled is about. I described why it was introduced for Stretched Clusters but will give a short summary here:
Two advanced settings have been introduced in vSphere 5.0 Update 1 to enable HA to fail-over virtual machines which are located on datastores which are in a Permanent Device Loss state. This is very specific to stretchec cluster environments. The first setting is configured on a host level and is “disk.terminateVMOnPDLDefault”. This setting can be configured in /etc/vmware/settings and should be set to “True”. This setting ensures that a virtual machine is killed when the datastore it resides on is in a PDL state.
The second setting is a vSphere HA advanced setting called “das.maskCleanShutdownEnabled“. This setting is also not enabled by default and it will need to be set to “True”. This settings allows HA to trigger a restart response for a virtual machine which has been killed automatically due to a PDL condition. This setting allows HA to differentiate between a virtual machine which was killed due to the PDL state or a virtual machine which has been powered off by an administrator.
But why is “das.maskCleanShutdownEnabled” needed for HA? From a vSphere HA perspective there are two different types of “operations”. The first is a user initiated power-off (clean) and the other is a kill. When a virtual machine is powered off by a user, part of the process is setting the property “runtime.cleanPowerOff” to true.
Remember that when “disk.terminateVMOnPDLDefault” is configured your VMs will be killed when they issue I/O. This is where the problem arises, in a PDL scenario it is impossible to set “runtime.cleanPowerOff” as the datastore, and as such the vmx, is unreachable. As the property defaults to “true” vSphere HA will assume the VMs were cleanly powered off. This would result in vSphere HA not taking any action in a PDL scenario. By setting “das.maskCleanShutdownEnabled” to true, a scenario where all VMs are killed but never restarted can be avoided as you are telling vSphere HA to assume that all VMs are not shutdown in a cleanly matter. In that case vSphere HA will assume VMs are killed UNLESS the property is set.
If you have a stretched cluster environment, make sure to configure these settings accordingly!