There aren’t a lot of changes in 5.5 when it comes to vSphere High Availability aka HA, but one is worth noting. As most of you are probably aware of, vSphere HA in the past did nothing with VM to VM Affinity or Anti Affinity rules. Typically for people using “affinity” rules this was not an issue, but those using “anti-affinity” rules did see this as an issue. They created these rules to ensure specific virtual machines would never be running on the same host, but vSphere HA would simply ignore the rule when a failure had occurred and just place the VMs “randomly”. With vSphere 5.5 this has changed! vSphere HA is now “anti affinity” aware. In order to ensure anti-affinity rules are respected you will need to set an advanced setting:
das.respectVmVmAntiAffinityRules - Values: "false" (default) and "true"
Now note that this also means that when you configure anti-affinity rules and have this advanced setting configured to “true” and somehow there aren’t sufficient hosts available to respect these rules… then rules will be respected and it could result in HA not restarting a VM. Make sure to understand this potential impact when configuring this setting and configuring these rules.
James Hess says
“then rules will be respected and it could result in HA not restarting a VM. Make sure to understand this potential impact when configuring this setting and configuring these rules.”
Seriously?
I want anti-affinity rules to be respected after a failover, but I also want the rule to be superceded if there is no way the hosts that remain online can still implement the rule.
The improvement doesn’t seem very useful; if they have inserted such a major drawback into it, that the affinity rules are now treated as hard constraints for purposes of HA.
Is it too much to ask to have HA respect soft constraints when possible, without compromising the purpose of HA?
Duncan says
DRS fixes those violates affinity rules within minutes, so you more or less have that today.
John Nicholson says
Problem is if Oracle See’s a CPU ID I have to license it….
James Hess says
“DRS fixes those violates affinity rules within minutes, so you more or less have that today.”
I used to think that: before I virtualized vCenter and stacked 400 VMs on the same 3-node cluster that vCenter is virtualized under. DRS definitely fixes affinity rule violations, eventually; after HA restarts the SQL server and vCenter server: it still sometimes takes a bit longer than 20 minutes all in all, and in the event that vCenter doesn’t recover properly after the crash, or for some reason HA doesn’t fail it over quickly — the cluster gets stuck with suboptimal VM placement in disagreement with affinity rules.
Although the bigger concern is temporary memory pressure after the HA event, before workloads get balanced on the remaining active hosts —- the HA failover doesn’t seem to put VMs in an optimal place; there are plenty of improvements VMware could make here…. as in, making DRS more resilient against vCenter failures.
Eric Gray says
Excellent – 400 VMs on a 3-node cluster!
Duncan says
I would argue that your scenario is pretty corner case. Not many customers run 400 VMs including VC on just 3 nodes.
Anyway, I had already asked the HA team to consider this being “preferential” instead of “mandatory”, hopefully they can sneak that in at some point.
Eduardo says
Hi Duncan,
I assume that ForceAffinePoweron still in place and required if I want to guarantee mys rules when powering on a VM, not by HA, right ?
thanks a lot.
Edu
Vaibhav says
Hi Duncan,
Will you be releasing VMware vSphere 5.5 Clustering Deepdive ?
Thanks
Mike says
I’d love to know this as well. I am currently working on my VCAP-DCA and about to read 5.1 again – would be nice to have 5.5 in the pipeline though 🙂
Peter Van Lone says
ditto – looking forward to the 5.5 HA deep dive!