This has always been a hot topic, HA and Slot sizes/Admission Control. One of the most extensive (Non-VMware) articles is by Chad Sakac aka Virtual Geek, but of course since then a couple of things has changed. Chad commented on my HA Deepdive if I could address this topic, here you go Chad.
Lets start with the basics.
What is a slot?
A slot is a logical representation of the memory and CPU resources that satisfy the requirements for any powered-on virtual machine in the cluster.
In other words a slot size is the worst case CPU and Memory reservation scenario in a cluster. This directly leads to the first “gotcha”:
HA uses the highest CPU reservation of any given VM and the highest memory reservation of any given VM.
If VM1 has 2GHZ and 1024GB reserved and VM2 has 1GHZ and 2048GB reserved the slot size for memory will be 2048MB+memory overhead and the slot size for CPU will be 2GHZ.
Now how does HA calculate how many slots are available per host?
Of course we need to know what the slot size for memory and CPU is first. Then we divide the total available CPU resources of a host by the CPU slot size and the total available Memory Resources of a host by the memory slot size. This leaves us with a slot size for both memory and CPU. The most restrictive number is the amount of slots for this host. If you have 25 CPU slots but only 5 memory slots the amount of available slots for this host will be 5.
As you can see this can lead to very conservative consolidation ratios. With vSphere this is something that’s configurable. If you have just one VM with a really high reservation you can set the following advanced settings to lower the slot size being used during these calculations: das.slotCpuInMHz or das.slotMemInMB. To avoid not being able to power on the VM with high reservations these VM will take up multiple slots. Keep in mind that when you are low on resources this could mean that you are not able to power-on this high reservation VM as resources are fragmented throughout the cluster instead of located on a single host.
Now what happens if you set the number of allowed host failures to 1?
The host with the most slots will be taken out of the equation. If you have 8 hosts with 90 slots in total but 7 hosts each have 10 slots and one host 20 this single host will not be taken into account. Worst case scenario! In other words the 7 hosts should be able to provide enough resources for the cluster when a failure of the “20 slot” host occurs.
And of course if you set it to 2 the next host that will be taken out of the equation is the host with the second most slots and so on.
One thing worth mentioning, as Chad stated with vCenter 2.5 the number of vCPUs for any given VM was also taken in to account. This led to a very conservative and restrictive admission control. This behavior has been modified with vCenter 2.5 U2, the amount of vCPUs is not taken into account.
Frank Wegner says
If you do not want to tweak advanced parameters you could also check if you really need the large exceptional reservations. Reducing the reservation will also increase the HA slot size. One example I know: a single VM with 40 GB memory reservation crippled HA (failover capacity = 0), setting the reservation to 10 GB helped a lot.
Frank Wegner says
oops, not “increase slot size” but “increase failover capacity”.
So the best case Is when no vm in the cluster has reservation ?
it’s not the best but the least restrictive. the best is when you are using a realistic reservation on at least one machine so that the slots are correctly sized.
If you pool servers by service / application and give them no direct reservation, but instead make a reservation at the resource pool level, will this avoid admission control problems?
5. Suttoi said, August 17th, 2009 at 08:12 If you pool servers by service / application and give them no direct reservation, but instead make a reservation at the resource pool level, will this avoid admission control problems?
can someone answer the question?
Duncan Epping says
Yes it will avoid these issues.
so reservations at resource pool level do not affect failover capacity?
Duncan Epping says
no they do not.
Jarrod Sturdivant says
If you have no CPU or Memory reservations in the virtual infrastructure, is the slot size calculated using the highest configured CPU and RAM for a virtual machine? If you have a single virtual machine that has significantly more RAM configured than your other virtual machines, how can you keep it from inflating your slot size without a reservation?
No it is calculated using a default: 256MHz for CPU (4.1 and prior) or 32MHz for CPU (5.0 and above) and the “memory overhead”.
Alan Wilson says
Are available ports/portgroups taken into consideration in the slot calculation?
Alan Wilson says
Didn’t think so but I have a maxed out cluster (View 5, 600+ VMs) with no available slots which won’t allow any more powered on VMs. Plenty of cpu and memory available – most of the powered on VMs are idle waiting for users to connect. Getting these errors in Vcenter log which triggered the question.
2013-03-13T11:46:49.932Z [04240 verbose ‘Default’ opID=a917af28] [VpxdMoVm::PowerOnInt] PowerOnIntImpl failed on VM /vpx/vm/#18815/ for reason : vim.fault.InsufficientFailoverResourcesFault
2013-03-13T11:46:49.932Z [04240 verbose ‘Default’ opID=a917af28] [MoDVSwitch::CheckForEagerPortAssignment] vm [COMP1] has no late binding portgroups in datacenter [VDI1] to connect to, moving on
2013-03-13T11:46:49.932Z [04240 error ‘Default’ opID=a917af28] (Log recursion level 2) vim.fault.InsufficientFailoverResourcesFault
Thanks for your quick response. will investigate further 🙂
Hold, there is a difference between having available resources and slotsize.
Slotsize = based on reservation / memory overhead etc
Used resources = what the VM is actually using, which says nothing about what is reserved
Do you have a reservation set somewhere? Which version of vSphere are you using? Why are you using the “Host failures” admission control and not the “Percentage based”?
Alan Wilson says
vCenter 5.0.0, esxi 5.0.0 ,702118, view 5.0.1
Why are you using the “Host failures” admission control and not the “Percentage based”?
Good question: I’ve joined this project late and just starting to pick it apart. 😉 The original VMware design doc did specify 17% for the VDI clusters but they’ve all been set to 1 host failover. Just trying to find out why.
What may have happened to trigger the errors is that some pools were changed recently to have 2vCPU VMs rather than 1 and HA using worst scenario has changed slot size to from 1 to 2 vCPU in response. Can always set the slot size manually, i suppose but I’d prefer to let HA do it, in case things change in the future.
No reservations set anywhere that I can see. How often is the slot size calculated?
I have a question for this old blog/Q&A. Above Jarrod asked, “If you have no CPU or Memory reservations in the virtual infrastructure, is the slot size calculated using the highest configured CPU and RAM for a virtual machine?” Duncan replied, “No it is calculated using a default: 256MHz for CPU (4.1 and prior) or 32MHz for CPU (5.0 and above) and the memory overhead” We’re preparing for a 4.1 to 5.5 upgrade including a restructering of how guest are arranged on hosts/clusters. We’re trying to eliminate the rare instances of reservations so we can plan/predict the slot sizes for the new cluster configurations. If I can collect a spreadsheet of the VMs we’ll have in a on the hosts in a cluster, how do I predict the slot size in cluding this memory overhead?