Last week at VMworld and on the VMTN community I had a couple of questions around resource management and HA Admission Control. It appears people were using HA Admission Control for managing resources within their environment. In other words, the amount of VMs that HA would allow you to restart would be leading for managing resources. But is that what you should do?
If you look at how HA works and what HA is intended to do the answer in short is, No. Now the reason for this is that HA is all about getting your virtual machines up and running again. If you look at HA Admission Control in vSphere 5.0 you will quickly see that for instance the default value for CPU has been decreased from 256MHz to 32MHz, if no CPU reservations are specified that is. Now in many scenarios virtual machines will consume and demand more than that. Another thing to point out is that if no memory reservation is specified the memory overhead of the VM is used. These values are more than likely much lower than what your virtual machine currently consumes or demands. The thing to keep in mind is that these CPU and Memory values only represent what HA needs in order to power-on your virtual machines.
If you want to manage resources, avoid severe overcommitment, guarantee a certain experience you should start looking at the DRS statistics. You should start exploring tools like VC Ops, Cap IQ… Don’t (ab)use vSphere HA for this. It is not designed to solve this problem. One thing to think about though is maybe increasing the minimum value for slotsizes to avoid scenarios where environments are fully overloaded!? If you have a consolidation ratio in mind it should be fairly simple to figure out which value to use:
available memory esource per host / consolidation ratio = das.vmMemoryMinMB
available CPU esource per host / consolidation ratio = das.vmCpuMinMHz
I am not saying that you should do this, but I think it might not be a bad practice in environments where multiple people have access to vCenter and can deploy VMs. At least people will be triggered when you are running out of “slots” to start VMs.
Yann Bizeul says
This is what always puzzled me. It would be so easy for vmware to provide a simple dynamic warning when resources are overcommitted, simply by looking at Resource usage/availability ratio.
I think I already know the answer “it is more complicated than that”.
Yeah, whatever, what do I say to a customer that needs to check if its cluster is well dimensioned ? I don’t tell him to spend $$$ in third-party products, or VMware add-ons, I simply tell him : Look at ressources usage on your nodes, and anticipate what you’d need in case of most-used-node failure.
So I know the formula “ConsumedMemory / (TotalClusterMemory – BiggestHostMemory)” isn’t the gold value but it would be certainly useful.
I fully agree with Duncan. This is where good design and best practices come in. Designs should cater for capacity growth as well as host failure.
The setting is meant to be overwritable to allow for adverse situations that must be resolved then revert back to a protected configuration.
Craig Risinger says
HA Admission Control is half-assed capacity management–at best.
HA is for restarting VMs in emergencies. Capacity Management is separate.
Capacity Management is processes and tools, set up when everything’s working, to ensure good performance (even with expectable component failures). HA is just a tool for bringing stuff back up in bad times.
Realize that Admission Control can prevent VMs from restarting in an emergency. Don’t tie HA’s hands. Capacity Management is difficult to do well, but it’s easy to do better than HA Admission Control does it.
That is just my opinion, of course.