When doing some research for the vSphere Clustering Technical Deepdive book I stumbled across something which was very surprising and difficult to grasp at first. I figured explaining it in a short article was the best approach. Many of you have read the HA deepdive article or the book and know that das.failuredetectiontime is probably the most commonly used advanced setting when configuring HA. There have been all sorts of recommendations and best practices flying around of which many were blatantly confusing to be honest. As stated in the previous article das.failuredetectiontime was no longer needed and has been deprecated. Did anything else change from an advanced settings perspective? Have advanced settings been added or removed. Here the new list:
- das.ignoreInsufficientHbDatastore – 5.0 only
Suppress the host config issue that the number of heartbeat datastores is less than das.heartbeatDsPerHost. Default value is “false”. Can be configured as “true” or “false”. - das.heartbeatDsPerHost – 5.0 only
The number of required heartbeat datastores per host. The default value is 2; value should be between 2 and 5. - das.failuredetectiontime – 4.1 and prior
Number of milliseconds, timeout time, for isolation response action (with a default of 15000 milliseconds). Pre-vSphere 4.0 it was a general best practice to increase the value to 60000 when an active/standby Service Console setup was used. This is no longer needed. For a host with two Service Consoles or a secondary isolation address a failuredetection time of 15000 is recommended. - das.isolationaddress[x] – 5.0 and prior
IP address the ESX hosts uses to check on isolation when no heartbeats are received, where [x] = 0‐9. (see screenshot below for an example) VMware HA will use the default gateway as an isolation address and the provided value as an additional checkpoint. I recommend to add an isolation address when a secondary service console is being used for redundancy purposes. - das.usedefaultisolationaddress – 5.0 and prior
Value can be “true” or “false” and needs to be set to false in case the default gateway, which is the default isolation address, should not or cannot be used for this purpose. In other words, if the default gateway is a non-pingable address, set the “das.isolationaddress0” to a pingable address and disable the usage of the default gateway by setting this to “false”. - das.isolationShutdownTimeout – 5.0 and prior
Time in seconds to wait for a VM to become powered off after initiating a guest shutdown, before forcing a power off. - das.allowNetwork[x] – 5.0 and prior
Enables the use of port group names to control the networks used for VMware HA, where [x] = 0 – ?. You can set the value to be ʺService Console 2ʺ or ʺManagement Networkʺ to use (only) the networks associated with those port group names in the networking configuration. - das.bypassNetCompatCheck – 4.1 and prior
Disable the “compatible network” check for HA that was introduced with ESX 3.5 Update 2. Disabling this check will enable HA to be configured in a cluster which contains hosts in different subnets, so-called incompatible networks. Default value is “false”; setting it to “true” disables the check. - das.ignoreRedundantNetWarning – 5.0 and prior
Remove the error icon/message from your vCenter when you don’t have a redundant Service Console connection. Default value is “false”, setting it to “true” will disable the warning. HA must be reconfigured after setting the option. - das.vmMemoryMinMB – 5.0 and prior
The minimum default slot size used for calculating failover capacity. Higher values will reserve more space for failovers. Do not confuse with “das.slotMemInMB”. - das.slotMemInMB – 5.0 and prior
Sets the slot size for memory to the specified value. This advanced setting can be used when a virtual machine with a large memory reservation skews the slot size, as this will typically result in an artificially conservative number of available slots. - das.vmCpuMinMHz – 5.0 and prior
The minimum default slot size used for calculating failover capacity. Higher values will reserve more space for failovers. Do not confuse with “das.slotCpuInMHz”. - das.slotCpuInMHz – 5.0 and prior
Sets the slot size for CPU to the specified value. This advanced setting can be used when a virtual machine with a large CPU reservation skews the slot size, as this will typically result in an artificially conservative number of available slots. - das.sensorPollingFreq – 4.1 and prior
Set the time interval for HA status updates. As of vSphere 4.1, the default value of this setting is 10. It can be configured between 1 and 30, but it is not recommended to decrease this value as it might lead to less scalability due to the overhead of the status updates. - das.perHostConcurrentFailoversLimit – 5.0 and prior
By default, HA will issue up to 32 concurrent VM power-ons per host. This setting controls the maximum number of concurrent restarts on a single host. Setting a larger value will allow more VMs to be restarted concurrently but will also increase the average latency to recover as it adds more stress on the hosts and storage. - das.config.log.maxFileNum – 5.0 only
Desired number of log rotations. - das.config.log.maxFileSize – 5.0 only
Maximum file size in bytes of the log file. - das.config.log.directory – 5.0 only
Full directory path used to store log files. - das.maxFtVmsPerHost – 5.0 and prior
The maximum number of primary and secondary FT virtual machines that can be placed on a single host. The default value is 4. - das.iostatsinterval (VM Monitoring) – 5.0 and prior
The I/O stats interval determines if any disk or network activity has occurred for the virtual machine. The default value is 120 seconds. - das.failureInterval (VM Monitoring) – 5.0 and prior
The polling interval for failures. Default value is 30 seconds. - das.minUptime (VM Monitoring) – 5.0 and prior
The minimum uptime in seconds before VM Monitoring starts polling. The default value is 120 seconds. - das.maxFailures (VM Monitoring) – 5.0 and prior
Maximum number of virtual machine failures within the specified “das.maxFailureWindow”, If this number is reached, VM Monitoring doesn’t restart the virtual machine automatically. Default value is 3. - das.maxFailureWindow (VM Monitoring) – 5.0 and prior
Minimum number of seconds between failures. Default value is 3600 seconds. If a virtual machine fails more than “das.maxFailures” within 3600 seconds, VM Monitoring doesn’t restart the machine. - das.vmFailoverEnabled (VM Monitoring) – 5.0 and prior
If set to “true”, VM Monitoring is enabled. When it is set to “false”, VM Monitoring is disabled.
Please note that this is the full list that I am aware of today, over time I will add / remove where and when applicable.
Wouter Verhulst says
I have to complement you on this HA series. Very informative!
Just one small error in part 5: the text for das.isolationaddress[x] seems incorrect.
Keep up the good work.
Duncan Epping says
Thanks, updated!
Steffen Oezcan says
There seems to be the same error with the “das.allowNetwork[x] – 5.0 and prior”-part 😉
Rickard Nobel says
“For a host with two Service Consoles or a secondary isolation address, it still is a best practice to increase the value to at least 20000. The impact of this is that the failover response will be delayed.”
But why should it still be a best practice for increasing the value if using multiple isolation addresses? As we discussed earlier this would not really change anything?
Duncan Epping says
Correct, this articles was written before that discussion. Will make the change,
Anand says
Since das.bypassNetCompatCheck setting is no longer supported. Could you please advise of alternate setting which is in place. We are currently upgrading our environment from ESX4.1 to ESXi5.5.\
Just curious to know coz we have clusters running in split chassis