I was talking to the HA team this week, specifically about the upcoming HA book. One thing they mentioned is that people still seem to struggle with the concept of admission control. This is the reason I wrote a whole chapter on it years ago, yet there still seems to be a lot of confusion. One thing that is not clear to people is the percentage calculations. We have some customers with VMs with extremely large reservations, in that case instead of using the “slot policy” they typically switch to “percentage based policy”. Simply as the percentage based policy is a lot more flexible.
However, recently we have had some customers that hit the following error message:
Insufficient configured resources to satisfy the desired vSphere HA failover level on the cluster
This error message, in the case of these particular situations (yes there was a bug as well, read this article on that), set the percentage lower than what would equal a full host. In other words, in a 4 host environment, a single host would equal 25%. In some cases, customers would set the percentage to a value lower than 25%. I am personally not sure why anyone would do this as it contradicts the whole essence of admission control. Nevertheless, it happens.
This message indicates that you may not have sufficient resources, in the case of a host failure, to restart all the VMs. This of course is the result of the percentage being set lower than the value that would equal a single host. Note though, this does not stop you from powering on new VMs. You will only be stopped from powering on new VMs when you exceed the available unreserved resources.
So if you are seeing this error message, please verify the configured percentage if you set it manually. Ensure that at a minimum it equals the largest host in the cluster.
** back to finalizing the book **
Sketch says
upcoming HA book? link to pre-order?
duncan@yellow-bricks says
No pre-order yet. Expect it around VMworld!
Ronny says
Hi Duncan,
thanks for this great article! There is still a lot of misunderstanding about how admission control works. This definitely helps to understand it.
What I’d really like to see in a future release is the possibility to exclude dev/test VMs from Admission Control calculation where you don’t want them to start during a host/site failure (HA disabled). In this case you could go beyond 50% resource usage in a stretched cluster environment during normal operations but still can tolerate a site failure for your production workload. Or am I missing something and this is already possible today?
regards,
Ronny
duncan@yellow-bricks says
This is not possible today, definitely something I have asked for, but something that hasn’t made it unfortunately. I will do so again!
David Ho says
Duncan,
What are your thoughts of HA offering VM power off/do not restart option? I recently designed a metro cluster solution for a customer and noticed a lot of idle resources (admission control being n/2 rather than typical n+1).
If VM power off/do not restart option is available, then it’s possible to have tier 1 VMs set to restart, tier 2 VMs set to power off/do not restart and admission control close to n+1 in a metro cluster design.
David
duncan@yellow-bricks says
This is similar to what Ronny asks above, definitely something I have asked for, but something that hasn’t made it unfortunately. I will do so again!
Steve says
Hi Duncan,
We are in the process of upgrading our environment to vSphere 6.5. We already have our VCSA on 6.5, latest build, and busy upgrading our hosts. We have quite a few clusters within this vCenter implementation with varying number of hosts in each cluster. This is also spread over 2 physical DC’s in different locations.
We have recently started getting this exact error, but only in 2 clusters, 1 in each DC for some strange reason. 1 cluster consists of 4 hosts and the other closer to 10 hosts. In each of these clusters, we configure HA with 1 dedicated host in Admission Control that has more memory than any other single host in the cluster.
I am aware that vSphere 6.5 has many more other advanced features, but I have tried almost everything to get rid of this error, including removing the dedicated spare host altogether. What else could be causing this error on these clusters?
Many thanks,