I had a question today around what the vSphere HA option advanced setting das.maskCleanShutdownEnabled is about. I described why it was introduced for Stretched Clusters but will give a short summary here:
Two advanced settings have been introduced in vSphere 5.0 Update 1 to enable HA to fail-over virtual machines which are located on datastores which are in a Permanent Device Loss state. This is very specific to stretchec cluster environments. The first setting is configured on a host level and is “disk.terminateVMOnPDLDefault”. This setting can be configured in /etc/vmware/settings and should be set to “True”. This setting ensures that a virtual machine is killed when the datastore it resides on is in a PDL state.
The second setting is a vSphere HA advanced setting called “das.maskCleanShutdownEnabled“. This setting is also not enabled by default and it will need to be set to “True”. This settings allows HA to trigger a restart response for a virtual machine which has been killed automatically due to a PDL condition. This setting allows HA to differentiate between a virtual machine which was killed due to the PDL state or a virtual machine which has been powered off by an administrator.
But why is “das.maskCleanShutdownEnabled” needed for HA? From a vSphere HA perspective there are two different types of “operations”. The first is a user initiated power-off (clean) and the other is a kill. When a virtual machine is powered off by a user, part of the process is setting the property “runtime.cleanPowerOff” to true.
Remember that when “disk.terminateVMOnPDLDefault” is configured your VMs will be killed when they issue I/O. This is where the problem arises, in a PDL scenario it is impossible to set “runtime.cleanPowerOff” as the datastore, and as such the vmx, is unreachable. As the property defaults to “true” vSphere HA will assume the VMs were cleanly powered off. This would result in vSphere HA not taking any action in a PDL scenario. By setting “das.maskCleanShutdownEnabled” to true, a scenario where all VMs are killed but never restarted can be avoided as you are telling vSphere HA to assume that all VMs are not shutdown in a cleanly matter. In that case vSphere HA will assume VMs are killed UNLESS the property is set.
If you have a stretched cluster environment, make sure to configure these settings accordingly!
Hi Duncan,
Great Post. It was referred to us by our TAM from VMware.
I have an additional scenario where these settings are useful:
If you have Blade-Servers as your physical Hardware for a vSphere 5 Cluster and you put the cluster-Hosts into different Blade Enclosures, these enclosures can each be looked at as a remote site (as in a streched cluster) even though they are in the same datacenter.
In the event of a SAN-Connection Failure on one of the enclosures the hosts in the other enclosures could restart the VMs via HA.
We’ve had a firmware-issue (during a fw-upgrade) and it screwed up both FlexFabric Modules on one enclosure. In the APD Situation that followed for some Hosts HA would not restart the VMs on the other unaffected Hosts from the same cluster.
Hi Duncan,
your Statement of “das.maskCleanShutdownEnabled” the is in contradiction to the statement of VMware in the KB 2033250 and the vSphere 5.5 documentation center. You state “This setting allows HA to differentiate between a virtual machine which was killed due to the PDL state or a virtual machine which has been powered off by an administrator.”. VMware states that ALL VMs are restarted, including the clean shutdown ones.
What statement is correct?
Jean
What this setting does is the following:
Before this setting when a VM crashed due to the storage being gone the stated of the VM was always “clean shutdown” by default. In other words, the hypervisor did not know what happened to the VM and assumed it was cleanly shutdown. When a VM is cleanly shut down then the VM is not restarted by HA as it assumes you powered it off for a reason.
This setting lets you explicitly define what the default behaviour should be when storage will go missing.