ha

Insufficient configured resources to satisfy the desired vSphere HA failover level on the cluster

Duncan Epping · Jul 12, 2018 ·

I was talking to the HA team this week, specifically about the upcoming HA book. One thing they mentioned is that people still seem to struggle with the concept of admission control. This is the reason I wrote a whole chapter on it years ago, yet there still seems to be a lot of confusion. One thing that is not clear to people is the percentage calculations. We have some customers with VMs with extremely large reservations, in that case instead of using the “slot policy” they typically switch to “percentage based policy”. Simply as the percentage based policy is a lot more flexible.

However, recently we have had some customers that hit the following error message:

Insufficient configured resources to satisfy the desired vSphere HA failover level on the cluster

This error message, in the case of these particular situations (yes there was a bug as well, read this article on that), set the percentage lower than what would equal a full host. In other words, in a 4 host environment, a single host would equal 25%. In some cases, customers would set the percentage to a value lower than 25%. I am personally not sure why anyone would do this as it contradicts the whole essence of admission control. Nevertheless, it happens.

This message indicates that you may not have sufficient resources, in the case of a host failure, to restart all the VMs. This of course is the result of the percentage being set lower than the value that would equal a single host. Note though, this does not stop you from powering on new VMs. You will only be stopped from powering on new VMs when you exceed the available unreserved resources.

So if you are seeing this error message, please verify the configured percentage if you set it manually. Ensure that at a minimum it equals the largest host in the cluster.

** back to finalizing the book **

Trigger APD on iSCSI LUN on vSphere

Duncan Epping · Jun 21, 2018 ·

I was testing various failure scenarios in my lab today for the vSphere Clustering Deepdive session I have scheduled for VMworld. I needed some screenshots and log files of when a datastore hit an APD scenario, for those who don’t know APD stands for all paths down. In other words: the storage is inaccessible and ESXi doesn’t know what has happened and why. vSphere HA has the ability to respond to that kind of failure. I wanted to test this, but my setup was fairly simple and virtual. So I couldn’t unplug any cables. I also couldn’t make configuration changes to the iSCSI array as that would rather trigger a PDL (permanent device loss), so how do you test and APD scenario?

After trying various things like killing the iSCSI daemon (it gets restarted automatically with no impact on the workload) I bumped in to this command which triggered the APD:

SSH in to the host you want to trigger the APD on, run the following command
```
esxcli iscsi session remove  -A vmhba65
```
Make sure of course to replace “vmhba65” with the name of your iSCSI adapter

This triggered APD, as witness in the fdm.log and vmkernel.log, and ultimately resulted in vSphere HA killing the impacted VM and restarting it on a healthy host. Anyway, just wanted to share this as I am sure there are others who would like to test APD responses in their labs or before their environment goes in to production.

There may be other easy ways as well, if you know any, please share in the comments section.

vSphere HA Restart Priority

Duncan Epping · Apr 4, 2018 ·

I’ve seen more and more questions popping up about vSphere HA Restart Priority lately. I figured I would write something about it. I already did in this post about what’s new in vSphere 6.5 and I did so in the Stretched Cluster guide. It has always been possible to set a restart priority for VMs, but pre-vSphere 6.5 this priority simply referred to the scheduling of the restart of the VM after a failure. Each host in a cluster can restart 32 VMs at the same time, so you can imagine that if the restart priority is only about VM restarts that it doesn’t really add a lot of value. (Simply because we can schedule many at the same time, and the priority would as such have no effect.)

As of vSphere 6.5 we have the ability to specify the priority and also specify when HA should continue with the next batch. Especially this last part is important, as this allows you to specify that we start with the next priority level when:

Resources are allocated (default)
VMs are powered on
Guest heartbeat is detected
App heartbeat is detected

I think these are mostly self-explanatory, note though the “resources are allocated” means that a target host for restart has been found by the master. So this happens within milliseconds. Very similar for VMs are powered on, this also says nothing about when a VM is available. This literally is “power on”. In some cases it could take 10-20 seconds for a VM to be fully booted and the apps to be available, in other cases it may take minutes… It all depends on the services that will need to be started within the VM. So if it is important for the “service provided” by the VM to be available before starting the next batch then option 3 or 4 would be your best pick. Note that with option 4 you will need to have VM/Application Monitoring and defined within the VM. Now when you have made your choice around when to start the next batch you can simply start adding VMs to a specific level.

Instead of the 3 standard restart “buckets” you now have 5: Highest, High, Medium, Low, Lowest. Why these funny names? Well that was done in order to stay backwards compatible with vSphere 6 / 5 etc. By default all VMs will have the “medium” restart priority, and no it won’t make any difference if you change all of them to high. Simply because the restart priority is about the priority between VMs, it doesn’t change the host response times etc. In other words, changing the restart priority only makes sense when you have VMs at different levels, and usually will only make a big difference when you also change the option “Start next priority VMs when”.

So where do you change this? Well that is pretty straight forward:

Click on your HA cluster and then the “Configure” Tab
Click on “VM Overrides” and then click “Add”
Click on the green plus sign and select the VMs you would like to give a higher, or lower priority
Then select the new priority and specify when the next batch should start

And if you are wondering, yes the restart priority also applies when vCenter is not available. So you can use it even to ensure vCenter, AD and DNS are booted up first. All of this info is stored in the cluster configuration data. You can examine this on the commandline by the way by typing the following:

/opt/vmware/fdm/fdm/prettyPrint.sh clusterconfig

Note that the outcome is usually pretty big, so you will have to scroll through it to find what you need, if you do a search on “restartPriority” then you should be able to find it the VMs for which you changed the priority. Pretty cool right?!

Oh, if you didn’t know yet… Frank, Niels and I are actively updating the vSphere Clustering Deep Dive. Hopefully we will have something out “soon”, as in around VMworld.

Insufficient configured resources to satisfy the desired vSphere HA failover level

Duncan Epping · Dec 9, 2017 ·

I was going over some of the VMTN threads and I noticed an issue brought up with Admission Control a while ago. Completely forgot about it until it was brought up again internally. With vSphere 6.5 and vSphere HA there seems to be a problem with some Admission Control Policies. When you for instance have selected the Percentage Based Admission Control Policy and you have a low number of hosts, you could receive the following error

Insufficient configured resources to satisfy the desired vSphere HA failover level on the cluster …

I managed to reproduce this in my lab, and this is what it looks like in the UI:

It seems to happen when there’s a minor difference in resources between the host, but I am not 100% certain about it. I am trying to figure out internally if it is a known issue or not, and will come back to this when I know in which patch it will be solved and/or if it is indeed a known issue.

Using HA VM Component Protection in a mixed environment

Duncan Epping · Nov 29, 2017 ·

I have some customers who are running both traditional storage and vSAN in the same environment. As most of you are aware, vSAN and VMCP do not go together at this point. So what does that mean for traditional storage, as in with traditional storage for certain storage failure scenarios you can benefit from VMCP.

Well the statement around vSAN and VMCP is actually a bit more delicate. vSAN does not propagate PDL or APD in a way which VMCP understands. So you can enable VMCP in your environment, without it having an impact on VMs running on top of vSAN. The VMs which are running on the traditional storage will be able to use the VMCP functionality, and if an APD or PDL is declared on the LUN they are running on vSphere HA will take action. For vSAN, well we don’t propagate the state of a disk that way and we have other mechanisms to provide availability / resiliency.

In summary: Yes, you can enable HA VMCP in a mixed storage environment (vSAN + Traditional Storage). It is fully supported.