I had a question this week from one of my colleagues which had me dazzled for a while. A customer had an HA enabled cluster and used “Host Failures Cluster Tolerates” as the admission control policy. As you hopefully all know it uses a slot algorithm, in short:
HA uses the highest CPU reservation of any given VM and the highest memory reservation of any given VM. If no reservations of higher than 256Mhz are set HA will use a default of 256Mhz for CPU and a default of 0MB+memory overhead for memory.
In their case they ended up with a slot size of 405MB. However after validating the overhead of all machines they found that the largest memory overhead was 149MB. So where did this 405MB come from? Luckily one of the engineers responded to the email thread and managed to clear things up. With vCenter 2.5 we also used a default slotsize of 256MB for memory. This default slotsize is configured in “vpxd.cfg” and unfortunately after upgrading from 2.5 to vCenter 4.0 this setting is not reset. For this customer that meant that the result was:
256 (default slotsize) + 149 (dynamic memory overhead) = 405MB
Although a minor issue, definitely something to keep in mind when troubleshooting HA slotsize issues. Always check the vpxd.cfg and check if there are any values defined for “<slotMemMinMB>”.
Linus B. says
Just to be clear, it should have been 0+overhead but it wasn’t because of the upgrade from 2.5?
That is correct. Although the “default behavior” changed this was only for newly installed environments as the installer did not change the setting.
I’ve always had a question about the slot size algorhithm. Could you clarify or point me in the right direction for clarification? My question: why is it that HA uses the highest CPU reservation of “any given VM” and the highest memory reservation of “any given VM”, and not ALL VMs in the cluster? Doesn’t the remaining host in the cluster (or hostS) need to have enough resources for ALL VMs, and not just the ones that are powered on, more specifically, a VM with the highest CPU Res and Mem Overhead?