stretched cluster

vSphere HA respecting VM-Host should rules?

Duncan Epping · Mar 5, 2015 ·

A long time ago I authored this white paper around stretched clusters. During out testing the one thing where we felt HA was lacking was the fact that it would not respect VM-Host should rules. So if you had these configured in a cluster and a host would fail then VMs could be restarted on ANY given host in the cluster. The first time that DRS would then run it would move the VMs back to where they belong according to the configured VM-Host should rules.

I guess one of the reasons for this was the fact that originally the affinity and anti-affinity rules were designed to be DRS rules. Over time I guess we realized that these are not DRS rules but rather Cluster rules. Based on the findings we did when authoring the white paper we filed a bunch of feature requests and one of them just made vSphere 6.0. As of vSphere 6.0 it is possible to have vSphere HA respecting VM-Host should rules through the use of an advanced setting called “das.respectVmHostSoftAffinityRules”.

When “das.respectVmHostSoftAffinityRules” is configured then vSphere HA will try to respect the rule when it can. So if there are any hosts in the cluster which belong to the same VM-Host group then HA will restart the respective VM on that host. Of course as this is a “should rule” HA has the ability to ignore the rule when needed. You can imagine that there could be a scenario where none of the hosts in the VM-Host should rule is available, in that case HA will restart the VM on any other host in the cluster. Useful? Yes, I think so!

VPLEX Geosynchrony 5.2 supporting up to 10ms latency with HA/DRS

Duncan Epping · May 14, 2014 ·

I was just informed that as of last week VPLEX Metro with Geosynchrony 5.2 has been certified for a round trip (RTT) latency up to 10ms while running HA/DRS in a vMSC solution. So far all vMSC solutions had been certified with 5ms RTT and this is a major breakthrough if you ask me. Great to see that EMC spent the time certifying this including support for HA and DRS across this distance.

Round-trip-time for a non-uniform host access configuration is now supported up to 10 milliseconds for VPLEX Geosynchrony 5.2 and ESXi 5.5 with NMP and PowerPath

More details on this topic can be found here:

HA restarts in a DR/DA event

Duncan Epping · May 3, 2014 ·

I received a couple of questions last week about HA restarts in the scenario where a full site failure has occurred or a part of the storage system has failed and needs to be taken over by another datacenter. Yes indeed this is related to stretched clusters and HA restarts in a DR/DA event.

The questions were straight forward, how does the restart time-out work and what happens after the last retry? I wrote about HA restarts and the sequence last year, so lets just copy and paste that here:

Initial restart attempt

If the initial attempt failed, a restart will be retried after 2 minutes of the previous attempt

If the previous attempt failed, a restart will be retried after 4 minutes of the previous attempt

If the previous attempt failed, a restart will be retried after 8 minutes of the previous attempt

If the previous attempt failed, a restart will be retried after 16 minutes of the previous attempt

You can extend the restart retry by increasing the value “das.maxvmrestartcount”. And then after every 15/16 minutes a new restart will be attempted. The question this triggered though is why would it even take 4 retries? The answer I got was: we don’t know if we will be able to fail over the storage within 30 minutes and if we will have sufficient compute resources…

Here comes the sweet part about vSphere HA, it actually is a pretty smart solution, it will know if VMs can be restarted or not. In this case as the datastore is not available there is absolutely no point in even trying and HA as such will not even bother. As soon as the storage becomes available though the restart attempts will start. Same applies to compute resource, if for whatever reason there is insufficient unreserved compute resources to restart your VMs then HA will wait for them to become available… nice right!?! Do note I emphasized the word “unreserved” as that is what HA cares about and not actually about used resources.

Disable “Disk.AutoremoveOnPDL” in a vMSC environment!

Duncan Epping · Nov 8, 2013 ·

** UPDATE 20-March 2016 **

When using vSphere 6.0 or higher, please be advised that Disk.AutoremoveOnPDL needs to be set to 1 (default value) in order for “PDL Scenarios” to be handles correctly in vMSC based infrastructures. Please do not change the default value, or when upgrading to vSphere 6.x please set this value to 1 when changed in previous version.

** UPDATE 20-March 2016 **

Last week I tweeted the recommendation to disable the advanced setting Disk.AutoremoveOnPDL in a vSphere 5.5 vMSC environment:

https://twitter.com/DuncanYB/status/394740133079298048

Based on this tweet I received a whole bunch of questions. Before I explain why I want to point out that I have contacted the folks in charge of the vMSC program and have requested them to publish a KB article asap on this subject.

With vSphere 5.5 a new setting was introduced called “Disk.AutoremoveOnPDL”. When you install 5.5 this setting is set to 1 which means it is enabled. What it does is the following:

The host automatically removes the PDL device and all paths to the device if no open connections to the device exist, or after the last connection closes. If the device returns from the PDL condition, the host can discover it, but treats it as a new device. Data consistency for virtual machines on the recovered device is not guaranteed.

(Source: http://pubs.vmware.com/vsphere-55/index.jsp?topic=%2Fcom.vmware.vsphere.storage.doc%2FGUID-45CF28F0-87B1-403B-B012-25E7097E6BDF.html)

In a vMSC environment you can understand that removing devices which are in a PDL state is not desired. As when the issue that caused the PDL has been solved (from a networking or array perspective) customers would expect the LUNs to automatically appear again. However, as they have been removed a “rescan” is needed to show these devices again instantly, or you will need to wait for the vSphere periodic path evaluation to occur. As you can imagine, in a vSphere Metro Storage Cluster environment (stretched storage) you expect devices to always be there instantly on recovery… even when they are in a PDL or APD state they should be available instantly when the situation has been resolved.

For now, I recommend to set Disk.AutoremoveOnPDL to 0 instead of the default of 1:

Hopefully soon this KB on the topic of Disk.AutoremoveOnPDL will be updated to reflect this.

vSphere 5.5 nuggets: changes to disk.terminateVMOnPDLDefault

Duncan Epping · Aug 28, 2013 ·

Those who were in the vSphere 5.5 beta program might have noticed it but I am suspecting many did not. With vSphere 5.5 there is finally an advanced setting to enable Disk.terminateVMOnPDLDefault. This advanced setting was introduced with vSphere 5.0 and unfortunately needed to be enabled in a file (/etc/vmware/settings); which was inconvenient to say the least. I asked the engineering team what the plans were to improve this but there were no direct plans. It took a bit longer then expected, but nevertheless the feature request I created made it in to the product. So if you are using a vSphere Metro Storage Cluster (what a coincidence, I am presenting on this topic in an hour at VMworld) please note that the following method should now be used to allow vSphere HA to respond to a Permanent Device Loss aka PDL:

Browse to the host in the vSphere Web Client navigator
Click the Manage tab and click Settings
Under System, click Advanced System Settings
In Advanced System Settings, select “VMkernel.Boot.terminateVMOnPDL”
Click the Edit button (pencil) to edit the value and set it to “Yes”
Click OK

Note the change in setting name from Disk.terminateVMOnPDLDefault to VMkernel.Boot.terminateVMOnPDL!