Got asked the following question today and thought it was valuable for everyone to know the answer to this:
How is a host selected for VM placement when HA restarts VMs from a failed host?
It’s actually a really simple mechanism. HA keeps track of the unreserved capacity of each host of the cluster. When a fail-over needs to occur the hosts are ordered. The host with the highest amount of unreserved capacity being the first option. Now to make it absolutely crystal clear, HA keeps track of the unreserved capacity and it is not DRS which does this. HA works completely independent of vCenter and as we all know DRS is part of vCenter. HA also works when DRS is disabled or unlicensed!
Now one thing to note is that HA will also verify if the host is compatible with the VM or not. What this means is that HA will verify if the VMs network is available on the target host and if the datastore is available on the target hosts. If both are the case a restart will be initiated on that host. To summarize:
- Order available host based on unreserved capacity
- Check compatibility (VM Network / Datastore)
- Boot up!
Jason Boche says
Another point which may be useful to address… I’ve heard confusion on whether VMs restart on a single HA host or whether VMs will start on multiple hosts, thus spreading the load “evenly”.
Rick Boyett says
Looking for clarification..
Is HA doing the resource tracking or is it DRS? HA isn’t supposed to be dependent on vCenter but we all know DRS is. The point of that being that HA can still restart your VMs on another host if vCenter goes down during the host outage (due to vCenter being hosted on a VM).
So, is HA or DRS doing that actual resource tracking? If it really is HA, is that resource tracking dependent on vCenter? If DRS is actually doing the tracking, then what is the HA’s behavior if a host goes down and takes vCenter with it?
I speculate that DRS is doing the resource tracking but it regularly reports to HA and lets it know the order of available hosts. HA then goes by the information available to it if DRS is down during a host failure.
Does that make sense or am I talking out of my 3rd point of contact? (^_~)
Jason Boche says
@Rick Boyett
http://www.yellow-bricks.com/vmware-high-availability-deepdiv/
“A VMware HA Cluster consists of nodes, primary and secondary nodes. Primary nodes hold cluster settings and all “node states” which are synchronized between primaries. Node states hold for instance resource usage information. In case that vCenter is not available the primary nodes will have a rough estimate of the resource occupation and can take this into account when a fail-over needs to occur. Secondary nodes send their state info to the primary nodes.”
Craig Risinger says
HA isn’t really tracking “resource utilization” so much as “resource Reservations”. How much of a resource has been Reserved is less dynamic than how much is being used, so this is a relatively simple set of data for HA primary nodes to track. Therefore they can do it without VC.
Whenever HA is considering capacity, it’s looking at Reservations, not actual utilization. (Duncan et al., please correct me if I’m wrong here.)
Once a VM has been restarted on whatever host HA deems apt, DRS can rebalance the load on the remaining hosts. But HA and DRS are acting independently. Their independent effects add to a good result without requiring coordination.
Duncan Epping says
Craig is correct and I tried to clarify the article.
There is no integration between DRS and HA. You can run HA without DRS. And HA uses reservations and not utilization indeed.
Rick Boyett says
@Craig Risinger
That is a crystal clear explanation.
Thanks
Rob says
So in the VI 3 days, it was my experience that all of the VM’s would fail over to one host in the cluster. Usually, this left an unbalanced cluster even when DRS was enabled @ 3 or 4 stars (less aggressive). So my question, … Does vSphere’s HA now distribute VMs for restart to multiple hosts, or is it still restarting all VM’s from the failed host to a single recovery host and then expecting DRS to perform load balancing over time.?
Brandon says
Rob, yes I was going to say the same thing. I experienced that in 3.5 — it would definitely be a change for 4.x. A welcome surprise, but HA is then making decisions based on reservations only without any realization of what happens when its powered on.
I thought if DRS was enabled and available, then initial placement would be used if DRS was set to at least partial auto?
Anton Zhbankov says
Duncan, Russian version – http://blog.vadmin.ru/2010/06/ha-deepdive-host-selection.html
Kaitlyn says
@Rick Boyett
http://www.yellow-bricks.com/vmware-high-availability-deepdiv/
“A VMware HA Cluster consists of nodes, primary and secondary nodes. Primary nodes hold cluster settings and all “node states” which are synchronized between primaries. Node states hold for instance resource usage information. In case that vCenter is not available the primary nodes will have a rough estimate of the resource occupation and can take this into account when a fail-over needs to occur. Secondary nodes send their state info to the primary nodes.”
Craig Risinger says
P.S. Another way to put a point:
HA and DRS are complementary but not coordinated.